Being publicly available does not mean that copyright is invalid. Copyright gives the holders the right to restrict USE, not merely restrict reproduction. Adaptation is also an exclusive right of the copyright holder. You're not allowed to make derivative works.
> They consumed publicly available material on the Internet
I agree that there are some important distinctions and word-choices to be made here, and that there are problems with equating training to "stealing", and that copyright infringement is not theft, etc.
That said, if you zoom out to the overall conduct, it's fair to argue that the companies are doing something unethical, the same as if they paid an army of humans to memorize other people's work and then regurgitate slightly-reworded copies.
> That said, if you zoom out to the overall conduct, it's fair to argue that the companies are doing something unethical, the same as if they paid an army of humans to memorize other people's work and then regurgitate slightly-reworded copies.
I would use the analogy of those humans learning from the material. Like reading books in the library
"regurgitate slightly-reworded copies" in my experience using LLMs (not insubstantial) that is an unfairly pejorative take on what they do
By that logic a copy of source code for a propriatary app that someone has stolen and placed online is immediately free for all to use as they wish.
Being on the internet doesnt make it yours, or acceptable to take. In the case of OpenAI (and Anthropic) they should be following the long held principle of the robots.txt file on sites, which can be specifically set to tell just them that they may not take your content - they openly ignore that request.
OpenAI absolutely is stealing from everyone, hence why most will have little sympathy when they complain someone stole from them.
"stole"?
They consumed publicly available material on the Internet
I am no fan of these billionaire capitalists and their henchpersons but condem them for their multitude of sins.
Consuming publicly available Internet resources is not one of them. IMO