IMO that's a terrible thought experiment given the situation. LLMs do not store ...

caconym_ · on Jan 17, 2024

> LLMs do not store enough content or with enough accuracy to even close to a virtual library. Unlike, say, Google and the Wayback Machine, the former of which stores enough to show snippets from the pages it's presenting to you as search results (and they got sued for that in certain categories of result), and the latter is straight up an archive of all the sites it crawls.

I already addressed this: "the only difference between it and what OpenAI is doing is that OpenAI's product relies on a technical means of laundering intellectual property that seems tailor-made to dodge a body of existing copyright law designed by people who could not possibly have conceived of what modern genAI is capable of."

You are certainly welcome to disagree with what I've said, but you can't simply pretend I didn't say it.

> Furthermore, the "percentage" in question for OpenAI is "once we've paid off our investors, all of it goes to benefitting humanity one way or another" — the parent company is a not-for-profit.

A quick Googling suggests that OpenAI employees are not working for free—far from it, in fact. In this frame I don't particularly care whether the organization itself is nominally "non profit", because profit motives are obviously present all the same.

> Here's a different question for you: If a generative AI is trained only on out-of-copyright novels and open-licensed modern works, and then still deprives everyone of all publishing opportunities forever as in this thought experiment it's better and cheaper than any human novelist, is that any more, or any less fair on literally any person on the planet? The outcomes are the same.

They are certainly welcome to try! Given how profoundly incapable extant genAI systems are of generating novel (no pun intended) output, including but not limited to developing artistic styles of their own, I think it would be quite funny to see these companies try to outcompete human artists with AI generated slop 70+ years behind the curve of art and culture. As for modern "public domain"-ish content, if genAI companies actually decided to respect intellectual property rights, I expect those licenses would quickly be amended to prohibit use in AI training.

AI systems will probably get there eventually, though it's very difficult to predict when. However, that speculation does not justify theft today.

> I'm sure someone's already thought of making such a model, it's just a question of if they raised enough money to train such a model.

People are absolutely throwing money at genAI right now, so if nobody has thrown enough money at this particular idea to give it a fair shake then the obvious conclusion is that people who know genAI think it's a relatively bad one. I'm inclined to agree with them.

> You may have noticed from the version number that they're on versions 3 and 4. When version 2 came out in 2019, they said [...]

Why is this relevant? I'm not talking about AI safety or "X risk" or whatever—I'm talking about straightforward intellectual property theft, which OpenAI and their contemporaries are obviously very comfortable with. The models they sell to anybody willing to pay today could literally not exist without their training datasets.