Are there any AI services that don't work on stolen data?

6 days ago

Are there any AI services that don't work on stolen data?

Treczoks@lemmy.world · 6 days ago

There are no legal sources big enough to train an AI on the level required to even perform basic interaction.

AmbitiousProcess (they/them)@piefed.social · 6 days ago

This is very true.

I was part of the OpenAssistant project, voluntarily submitting my personal writing to train open-source LLMs without having to steal data, in the hopes it would stop these companies from stealing people’s work and make “AI” less of a black box.

After thousands of people submitting millions of prompt-response pairs, and after some researchers said it was the highest quality natural language dataset they’d seen in a while, the base model was almost always incoherent. You only got a functioning model if you just used the data to fine-tune an existing larger model, Llama at the time.