Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

IMO that's a terrible thought experiment given the situation.

LLMs do not store enough content or with enough accuracy to even close to a virtual library. Unlike, say, Google and the Wayback Machine, the former of which stores enough to show snippets from the pages it's presenting to you as search results (and they got sued for that in certain categories of result), and the latter is straight up an archive of all the sites it crawls.

Furthermore, the "percentage" in question for OpenAI is "once we've paid off our investors, all of it goes to benefitting humanity one way or another" — the parent company is a not-for-profit.

> Does that make what I'm doing okay? Should all those authors deprived of royalties on their work now, even deprived of publishing opportunities as legitimate sales collapse, understand my token contribution to UBI as fair compensation for what I'm taking from them?

Here's a different question for you: If a generative AI is trained only on out-of-copyright novels and open-licensed modern works, and then still deprives everyone of all publishing opportunities forever as in this thought experiment it's better and cheaper than any human novelist, is that any more, or any less fair on literally any person on the planet? The outcomes are the same.

I'm sure someone's already thought of making such a model, it's just a question of if they raised enough money to train such a model.

> OpenAI's product relies on a technical means of laundering intellectual property that seems tailor-made to dodge a body of existing copyright law designed by people who could not possibly have conceived of what modern genAI is capable of

You may have noticed from the version number that they're on versions 3 and 4. When version 2 came out in 2019, they said:

"""Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text. Due to our concerns about malicious applications of the technology, we are not releasing the trained model."""

They were mocked for this: https://slate.com/technology/2019/02/openai-gpt2-text-genera...

Even here: https://news.ycombinator.com/item?id=21306542

Indeed, they are still mocked for suggesting their models may carry any risk at all. Of any kind. There are plenty of people who want to rush forward with this and think OpenAI are needlessly slow and cautious.

You may also have noticed their CEO gave testimony in the US Congress, and that the people asking him questions were surprised he said (to paraphrase) "regulate us specifically — not the open source models, they're not good enough yet — us".

To the extent that any GenAI can pose an economic threat to a creative job, it has to be better than a human in that same job. For now, IMO, they're assistant-level, not economic-threat-level. And when they get to economic-threat-level (which in fairness could be next month or next year), they'll be that threat even if none of your IP ever entered their training runs.



> LLMs do not store enough content or with enough accuracy to even close to a virtual library. Unlike, say, Google and the Wayback Machine, the former of which stores enough to show snippets from the pages it's presenting to you as search results (and they got sued for that in certain categories of result), and the latter is straight up an archive of all the sites it crawls.

I already addressed this: "the only difference between it and what OpenAI is doing is that OpenAI's product relies on a technical means of laundering intellectual property that seems tailor-made to dodge a body of existing copyright law designed by people who could not possibly have conceived of what modern genAI is capable of."

You are certainly welcome to disagree with what I've said, but you can't simply pretend I didn't say it.

> Furthermore, the "percentage" in question for OpenAI is "once we've paid off our investors, all of it goes to benefitting humanity one way or another" — the parent company is a not-for-profit.

A quick Googling suggests that OpenAI employees are not working for free—far from it, in fact. In this frame I don't particularly care whether the organization itself is nominally "non profit", because profit motives are obviously present all the same.

> Here's a different question for you: If a generative AI is trained only on out-of-copyright novels and open-licensed modern works, and then still deprives everyone of all publishing opportunities forever as in this thought experiment it's better and cheaper than any human novelist, is that any more, or any less fair on literally any person on the planet? The outcomes are the same.

They are certainly welcome to try! Given how profoundly incapable extant genAI systems are of generating novel (no pun intended) output, including but not limited to developing artistic styles of their own, I think it would be quite funny to see these companies try to outcompete human artists with AI generated slop 70+ years behind the curve of art and culture. As for modern "public domain"-ish content, if genAI companies actually decided to respect intellectual property rights, I expect those licenses would quickly be amended to prohibit use in AI training.

AI systems will probably get there eventually, though it's very difficult to predict when. However, that speculation does not justify theft today.

> I'm sure someone's already thought of making such a model, it's just a question of if they raised enough money to train such a model.

People are absolutely throwing money at genAI right now, so if nobody has thrown enough money at this particular idea to give it a fair shake then the obvious conclusion is that people who know genAI think it's a relatively bad one. I'm inclined to agree with them.

> You may have noticed from the version number that they're on versions 3 and 4. When version 2 came out in 2019, they said [...]

Why is this relevant? I'm not talking about AI safety or "X risk" or whatever—I'm talking about straightforward intellectual property theft, which OpenAI and their contemporaries are obviously very comfortable with. The models they sell to anybody willing to pay today could literally not exist without their training datasets.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: