I’m genuinely surprised to see this not discussed more by the FOSS community. There are so many ways to blow past the GPL now:
1. File by file rewrite by AI (“change functions and vars a bit”)
2. One LLM writes a diff language (or pseudo code) version of each function that a diff LLM translates back into code and tests for input/output parity
The real danger is that this becomes increasingly undetectable in closed source code and can continue to sync with progress in the GPLed repo.
I don’t think any current license has a plausible defense against this sort of attack.
I’ve never delved fully into IP law, but wouldn’t these be considered derivative works? They’re basically just reimplementing exactly the same functionality with slightly different names?
This would be different from the “API reimplementation” (see Google vs Oracle) because in that case, they’re not reusing implementation details, just the external contract.
Because copyrights do not protect ideas. Thankfully. We are free to express ideas, as long as we do so in our own words. How that principle is applied in actual law, and how that principle is a applied to software is ridiculously complicated. But that is the heart of the principle at play here. The law draws a line between ideas (which cannot be copyrighted), and particular expressions of those ideas (e.g. the original source code), which are protected. However, it is an almost fractally complicated line which, in many place, relies on concepts of "fairness", and, because our legal system uses a system of legal precedence, depends on interpretation of a huge body of prior legal decisions.
Not being a trained lawyer, or a Supreme Court justice, I cannot express a sensible position as to which side of the line this particular case falls. There are, however, enormously important legal precedents that pretty much all professional software developers use to guide their behaviour with respect to handling of copyrighted material (IBM vs. Ahmdall, and Google v. Oracle, particularly) that seem to suggest to us non-lawyers that this sort of reimplementation is legal. (Seek the advice of a real lawyer if it matters).
Taking a step back, it seems fairly clear that wherever you set the bar, it should be possible to automate a system that reads code, generates some sort of intermediate representation at the acceptable level of abstraction and then regenerates code that passes an extensive set of integration tests … every day.
At that point our current understanding of open source protections … fails?
"change functions and bars a bit" isn't a rewrite. Anything where the LLM had access to the original code isn't a rewrite. This would just be a derivative work.
However most of the industry willfully violates the GPL without even trying such tricks anyway so there are certainly issues
#1 is already possible and always has been. I never heard of a case of anyone actually trying it. #2 is too nitpicky and unnecessarily costly for LLMs. It would be better to just ask it to generate a spec and tests based on the original, them create a separate implementation based on that. A person can do that today free and clear. If LLMs will be able to do this, we will just need to cope. Perhaps the future is in validating software instead of writing it.
(1) sounds like a derivative work, but (2) is an interesting AI-simulacrum of a clean room implementation IF the first LLM writes a specification and not a translation.
1. File by file rewrite by AI (“change functions and vars a bit”)
2. One LLM writes a diff language (or pseudo code) version of each function that a diff LLM translates back into code and tests for input/output parity
The real danger is that this becomes increasingly undetectable in closed source code and can continue to sync with progress in the GPLed repo.
I don’t think any current license has a plausible defense against this sort of attack.