Using Generative AI For Code Can Be a Big Risk
October 19, 2023
If you’ve been paying any attention at all, you know that generative AI tools are being used to create content in all sorts of areas, from visual art to television and movie scripts and news articles, and for many companies in the embedded space, code.
Advocates will talk about streamlining process and time to market and cost savings, but there are many concerns, too. The Writers Guild of America recently won a new contract from the AMPTP thanks to a protracted strike that was in large part driven by concerns about generative AI tools infringing on writer’s rights. That contract includes concessions by the studios with respect to how AI can be used to produce stories, most notably that AI-generated written material is not considered literary material, source material, or assigned material.
This is relevant to engineers and developers because under current law, computer code is covered by copyright as a creative work, but only if it’s substantially created by human coders. In an August decision by the US District Court for the District of Columbia, Judge Beryl A. Howell ruled that AI-generated art cannot be copyrighted under US law. In the decision, Howell said, “this case presents only the question of whether a work generated autonomously by a computer system is eligible for copyright. In the absence of any human involvement in the creation of the work, the clear and straightforward answer is the one given by the Register: No.”
Ken Liu is a Science Fiction writer, graduate of Harvard Law School and practiced as a corporate lawyer and high-tech litigation consultant before turning to writing full time. His works include The Paper Menagerie and Other Stories and The Grace of Kings, the first volume of The Dandelion Dynasty tetralogy.
When it comes to AI generated code, Liu says he sees a multitude of problems, starting with the fact that it’s not clear whether it can be protected by copyright at all, like other code. “If you’re using software from generated code, you’re taking a risk,” he said. “Many of these tools were trained on open data sets.”
So, what does this mean for coders and, more importantly, the products they’re building?
At the basic level, it means that any code that is entirely generated by AI is not copyrightable, and therefore products that use that code likely have a murky-at-best chance of being proprietary in any way. Usually, though, when an autocoder is in play, an engineer or developer will make changes, edits, and tweaks to that code before it gets to the final deployment.
What about that code? Is it protected? Well, it depends.
Many of the available autocode tools, including GitHub Copilot, are trained on a dataset of code in public repositories and in CoPilot’s case, the Open AI model Codex. If any of that code is licensed in any limited way, any product using the code might have patentability impacts.
In a January 2023 article for Bloomberg Law on AI IP issues, the authors wrote, “If the generated code includes open-source code publicly available under a copyleft license, such as GPL v3, then it may cause the entire generated code to inherit the same open-source license, which may “taint” the developer's whole source code, including proprietary code. That is, if a developer includes open-source licensed code in the developed code, the developed code may be considered as using open-source code and the license of the open-source code may get applied to the developed code.”
This could mean that a commercial product must make its source code publicly available, or it could simply mean that the product isn’t patentable. These risks are still indeterminate since we’re so early in the development of these AI tools and they haven’t been legally challenged yet. But it’s a concern. In fact, it’s possible that even using the Generative AI written code at all might make you liable to copyright infringement suits, depending upon what code sets the model was trained.
It’s not a lost cause, however. Liu says that although human intervention is absolutely required for copyright protection, “there’s potential for the idea that the amount of human creativity is minimal. The human rearrangement of data or code in some new order might be protectable.”
Barry Diller, Chairman of IAC and Expedia, has called for standards and rulings to define the law around AI Generated content because of these very issues. He said in a recent interview for CNBC that the Fair Use doctrine needs to be updated to reflect Generative AI uses. “All we want to do is establish that there is no such thing as fair use for AI, which gives us standing,” Diller said.
This point lends itself to the concern that any product built from Generative AI could potentially be ruled as derivative work, which means the originator of the code or data set might own the rights to your code. The law is not set on how this precedent impacts these issues yet.
“It’s not clear if sufficiently long code strings could be plagiarism,” Liu said. “We haven’t seen any lawsuits yet.”
This issue of copyright and code must be settled before anyone should feel confident using Generative AI code in their next product or solution. The risk is too great, even if you don’t think you’ll get noticed, caught, or sued.