AI’s become so invasively popular and I’ve seen more evidence of its ineffectiveness than otherwise, but what I dislike most about it is that many run on datasets of stolen data for the sake of profitability à la OpenAI and Deepseek
https://mashable.com/article/openai-chatgpt-class-action-lawsuit https://petapixel.com/2025/01/30/openai-claims-deepseek-took-all-of-its-data-without-consent/
Are there any AI services that run on ethically obtained datasets, like stuff people explicitly consented to submitting (not as some side clause of a T&C), data bought by properly compensating the data’s original owners, or datasets contributed by the service providers themselves?
You can’t steal data, just illegally copy it. So no LLM is trained on stolen data.
Okay, so conversion or unjust enrichment instead of literal theft.