This is already legal without AI. Copyright protects only expression, not ideas, systems or methods. This is why directly reverse-engineering a proprietary binary to extract the algorithms and systems is legal.
Indeed, but it’s a much more legally dubious proposition when it comes to entire repos. A repo has more potentially creative structure for copyright to attach to. For example the class graph, or the filesystem layout are creative decisions that could potentially be protected. Current LLMs are nowhere near powerful enough to reimplement an entire repo without violating copyright.
For an individual function I can totally believe GPT4 could strip creative expression from it today. For example you could ask it to give a detailed description of a function in English, and then feed that English description back in (in a new session) and ask it to generate a code based upon the description.
Sounds like clean room, and if you can do that for GPL code, you can also do that for proprietary code, which is fair in a sense. Or maybe the question is whether you can re-label the code so written as GPL or MIT ......Or, you should let GPT pick a license that it likes.
Reimplementing would also help with training data, it's a way of extracting the idea without copying the original form. Works even better on images with variations, you generate the style from image A with content composition from image B, thus extracting the style without the exact expression.
> Copyright protects only expression, not ideas, systems or methods
Copyright is a law agreed by a humans in a social contract created to protect humans and further their interests in a 'fair' manner. There is no inalienable right to copyright, no universal law that requires it, it's not an emergent property of intelligence that mechanically applies to artificial entities.
So while the current copyright laws could be interpreted in the way you suggest for the time being, they are clearly written without any notion of AI, and can and should be revised to incorporate the new state of the world; you can bet creators will push hard in that direction. It's pretty clear that the mechanical transformation of a human body of work for the sole purpose of stripping it of copyright is a violation of the spirit of copyright law *.
*( as long as that machine can't also generate a similar work from scratch, in which case the point becomes moot. But we are far, far, from that point)
This is an overcomplicated method of applying the idea/expression dychotomy. It is perfectly legal to not use clean room design, as shown by the Sega v. Accolade and Sony v. Connectix cases.