> I mean, they could be nice and respect your robots.txt, but they certainly don't have to.
That case was limited to the CFAA, but you seem to get the gist of what I'm saying when I specified it's different when it's Microsoft doing the scraping. If Bing starts ignoring robots.txt and data still start showing up in their results, all the early 2000s lawsuits are going to be opened back up.
> It's possible that fair use law will be expanded to cover this case, but as constructed the output of these models is generally fairly derivative of any specific original, and so probably protected under fair use.
Unless there's a reason for them to be considered fair use, derivative works are going to lose a copyright suit. And what's the fair use argument? If I'm the only one on the internet saying something and suddenly ChatGPT can talk about the same thing and I'm losing money as a result, there's no fair use argument there. Search engines won those early lawsuits by being transformative (index vs content), minimal, and linking to their source. None of that would apply here.
What GP means is that ChatGPT output is generally not similar enough to any _particular_ source document to establish the fact that it's derivative. Instead, it resembles what you'd get if you asked a (credulous and slightly dumb) human to read a selection of documents and then summarize them. These kinds of summaries are absolutely not copyright violations, even if the source document can actually be identified.
> I mean, they could be nice and respect your robots.txt, but they certainly don't have to.
That case was limited to the CFAA, but you seem to get the gist of what I'm saying when I specified it's different when it's Microsoft doing the scraping. If Bing starts ignoring robots.txt and data still start showing up in their results, all the early 2000s lawsuits are going to be opened back up.
> It's possible that fair use law will be expanded to cover this case, but as constructed the output of these models is generally fairly derivative of any specific original, and so probably protected under fair use.
Unless there's a reason for them to be considered fair use, derivative works are going to lose a copyright suit. And what's the fair use argument? If I'm the only one on the internet saying something and suddenly ChatGPT can talk about the same thing and I'm losing money as a result, there's no fair use argument there. Search engines won those early lawsuits by being transformative (index vs content), minimal, and linking to their source. None of that would apply here.