Ownership of AI-Generated Code Hotly Disputed

IEEE SpectrumFOR THE TECHNOLOGY INSIDER
TopicsAerospaceArtificial IntelligenceBiomedicalClimate TechComputingConsumer ElectronicsEnergyHistory of TechnologyRoboticsSemiconductorsTelecommunicationsTransportation
SectionsFeaturesNewsOpinionCareersDIYEngineering Resources
MoreNewslettersPodcastsSpecial ReportsCollectionsExplainersTop Programming LanguagesRobots Guide ↗IEEE Job Site ↗
For IEEE MembersCurrent IssueMagazine ArchiveThe InstituteThe Institute Archive
For IEEE MembersCurrent IssueMagazine ArchiveThe InstituteThe Institute Archive
IEEE SpectrumAbout UsContact UsReprints & Permissions ↗Advertising ↗
Follow IEEE Spectrum
Support IEEE SpectrumIEEE Spectrum is the flagship publication of the IEEE — the world’s largest professional organization devoted to engineering and applied sciences. Our articles, podcasts, and infographics inform our readers about developments in technology, engineering, and science.
Join IEEE
Subscribe
About IEEEContact & SupportAccessibilityNondiscrimination PolicyTermsIEEE Privacy PolicyCookie PreferencesAd Privacy Options
© Copyright 2024 IEEE — All rights reserved. A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

GitHub Copilot dubs itself as an “AI pair programmer” for software developers, automatically suggesting code in real time. According to GitHub, Copilot is “powered by Codex, a generative pretrained AI model created by OpenAI” and has been trained on “natural language text and source code from publicly available sources, including code in public repositories on GitHub.”

However, a class-action lawsuit filed against GitHub Copilot, its parent company Microsoft, and OpenAI claims open-source software piracy and violations of open-source licenses. Specifically, the lawsuit states that code generated by Copilot does not include any attribution to the original author of the code, copyright notices, or a copy of the license, which most open-source licenses require.

“The spirit of open source is not just a space where people want to keep it open,” says Sal Kimmich, an open-source developer advocate at Sonatype, machine-learning engineer, and open-source contributor and maintainer. “We have developed processes in order to keep open source secure, and that requires traceability, observability, and verification. Copilot is obscuring the original provenance of those [code] snippets.”

“I very much hope that what comes out of this lawsuit will be something I can rely on when making decisions about training models in the future.”
—Stella Biderman, EleutherAI

In an attempt to address the issues with open-source licensing, GitHub plans to introduce a new Copilot feature that will “provide a reference for suggestions that resemble public code on GitHub so that you can make a more informed decision about whether and how to use that code,” including “providing attribution where appropriate.” GitHub also has a configurable filter to block suggestions matching public code.

The onus, however, still falls on developers, as GitHub states in Copilot’s terms and conditions: “GitHub does not claim any rights in Suggestions, and you retain ownership of and responsibility for Your Code, including Suggestions you include in Your Code.”

In addition to open-source licensing issues, Copilot raises concerns in terms of the legality of training the system on publicly available code, as well as whether generated code could result in copyright infringement.

Kimmich points to the Google v. Oracle case, wherein “taking the names of methods, but not the functional implementation, is OK. You’re replacing the functional content but still keeping some of the template.” In the case of Copilot, it might generate copyrighted code verbatim. (See the related tweet below from Tim Davis, computer science professor at Texas A&M University, as an illustration of Copilot generating copyrighted code.)

\u201c@github copilot, with "public code" blocked, emits large chunks of my copyrighted code, with no attribution, no LGPL license. For example, the simple prompt "sparse matrix transpose, cs_" produces my cs_transpose in CSparse. My code on left, github on right. Not OK.\u201d
— Tim Davis (@Tim Davis) 1665884834

Kit Walsh, a senior staff attorney at the Electronic Frontier Foundation, argues that training Copilot on public repositories is fair use. “Fair use protects analytical uses of copyrighted work. Copilot is ingesting code and creating associations in its own neural net about what tends to follow and appear in what contexts, and that factual analysis of the underlying works is the kind of fair use that cases involving video-game consoles, search engines, and APIs have supported.”

But when it comes to generated code, Walsh says it boils down to “how much [Copilot] is reproducing from any given element of the training data” and if it encompasses creative expression that is copyrightable. “If so, there could be infringement happening,” she says.

The lawsuit against GitHub Copilot is the first of its kind to challenge generative AI. “It’s setting a legal precedent that has implications for other generative tools,” Walsh says. “It’s the type of work that if a person authored [it, they] could qualify for copyright protection, and it could embody someone else’s copyrighted work, like snippets of code.”

“If I as an engineer would like to use Copilot, I will need to be able to restrict what it provides me to code that’s attributed to the license.”
—Sal Kimmich, Sonatype

For Stella Biderman, an AI researcher at Booz Allen Hamilton and EleutherAI, the lawsuit is a welcome development. “It’s going to, I hope, provide clarity and guidance as to what is actually legal, which is one of the big issues for those working on open-source AI,” she says. “I very much hope that what comes out of this lawsuit will be something I can rely on when making decisions about training models in the future.”

The open-source community seems divided on the lawsuit and GitHub Copilot itself. For instance, the Software Freedom Conservancy has been vocal about its concerns with Copilot—even calling for a boycott of GitHub—but is cautious about joining the class-action lawsuit. Kimmich says they know of open-source developers taking an ethical stance in choosing not to use Copilot, but also others who are enjoying it: “They’re learning while developing and executing code on the fly.”

Kimmich is on a waitlist for Copilot and recognizes the benefits it offers developers. “The neural network behind it is using more than just code to help you—it’s providing much more contextual information,” they said. “It means I as a developer now have an extended intelligence, which is giving me a contextualized recommendation. I think that’s excellent. It’s the most powerful generative intelligence that we’ve had so far for this application.”

Yet unless the open-source licensing issue is solved, Kimmich envisions using GitHub Copilot only for pet projects and exploring new packages. “It stops short of production code because of the licensing issue,” they said. “If I as an engineer would like to use Copilot, I will need to be able to restrict what it provides me to code that’s attributed to the license, or have a license which states that it was codeveloped. If I can’t locate the provenance of the original licenses or the original intellectual property, then I need to be able to know if I want to avoid it.”

Another solution would be for GitHub Copilot to modify its AI model so that it traces attribution and gives credit to the original authors of the code, adding the associated copyright notices and license terms in the process, which Biderman says is technologically feasible. “The position that OpenAI and Microsoft seem to have taken is that it is unduly onerous on them to filter by license when other models successfully do it.” She points to academic models such as InCoder as an example, which is trained on code that it has a license for. “There are other options and other models that are both more ethical and more likely to be legal,” Biderman says.

This article appears in the January 2023 print issue as “Do You Own the Code AI Helps You Create?.”

From Your Site Articles

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Ownership of AI-Generated Code Hotly Disputed

A copyright storm may be brewing for GitHub Copilot

Video Friday: RACER Heavy

As Ukraine Builds New Reactors, Renewables Beckon

Travels with Perplexity AI

Related Stories

Llama 3 Establishes Meta as the Leader in “Open” AI

Startups Say India Is Ideal for Testing Self-Driving Cars

AI Chip Trims Energy Budget Back by 99+ Percent

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

Ownership of AI-Generated Code Hotly Disputed

A copyright storm may be brewing for GitHub Copilot

Video Friday: RACER Heavy

As Ukraine Builds New Reactors, Renewables Beckon

Travels with Perplexity AI

Related Stories

Llama 3 Establishes Meta as the Leader in “Open” AI

Startups Say India Is Ideal for Testing Self-Driving Cars

AI Chip Trims Energy Budget Back by 99+ Percent