Early testers of GitHub’s Copilot, which uses AI to assist programmers to write code, have found problems including alleged spilled secrets, bad code, and copyright concerns, though some see huge potential in the tool.
GitHub Copilot was released as a limited “technical preview” last week with the claim that it is an “AI pair programmer.” It is powered by a system called Codex, from OpenAI, a company which went into partnership with Microsoft in 2019, receiving a $1bn investment.
According to its website, the Codex model is trained by “public code and text on the internet” and “understands both programming and human languages.”
An extension to Visual Studio Code “sends your comments and code to the GitHub Copilot service, which then users OpenAI Codex to synthesize and suggest individual lines and whole functions.”
What could go wrong?
One developer tried an experiment, writing some code to send an email via the Sendgrid service and prompting Copilot by typing “apiKey :=”. Copilot responded with at least four proposed keys, according to his screenshot and bug report. He reported it as a bug under the name, “AI is emitting secrets.”
But were the keys valid? GitHub CEO Nat Friedman responded to the bug report, stating that “these secrets are almost entirely fictional, synthesized from the training data.”
A Copilot maintainer added that “the probability of copying a secret from the training data is extremely small. Furthermore, the training data is all public code (no private code at all) so even in the extremely unlikely event a secret is copied, it was already compromised.”
While reassuring, even the remote possibility that Copilot is prompting coders with other user’s secrets is perhaps a concern. It touches on a key issue: is Copilot’s AI really writing code, or is it copy-pasting chunks from its training sources?
GitHub attempted to address some of these issues in a FAQ. “GitHub Copilot is a code synthesizer, not a search engine,” it said. “The vast majority of the code that it suggests is uniquely generated and has never been seen before.”
According to its own study, however, “about 0.1 per cent of the time, the suggestion may contain some snippets that are verbatim from the training set.”
This 0.1 per cent (and some early users think it is higher) is troublesome. GitHub’s proposed solution, as given in this paper, is that when the AI is quoting rather than synthesizing code it will give attribution. “That way, I’m able to look up background information about that code, and to include credit where credit is due,” said GitHub machine learning engineer Albert Ziegler.
The problem is that there are circumstances where Copilot may be prompting developers to do the wrong thing, for example with code that is open source but protected by copyright.
In the case of GPL code, which is copyleft, the inclusion of the code could impact the licensing of the new work. It is confusing, since the Copilot FAQ states that “the suggestions GitHub Copilot generates, and the code you write with its help, belong to you, and you are responsible for it;” but an attributed block of code would be an exception.
GitHub also said that “training machine learning models on publicly available data is considered fair use across the machine learning community,” pre-empting concerns about the AI borrowing other people’s code. There is some uncertainty.
GitHub’s CEO said on Twitter that “we expect that IP and AI will be an interesting policy discussion around the world in the coming years, and we’re eager to participate.”
Developer Eevee said that “GitHub Copilot has, by their own admission, been trained on mountains of gpl code, so I’m unclear on how it’s not a form of laundering open source code into commercial works.”
The only way is ethics
Since Copilot will be a paid-for product, there is an ethical as well as a legal debate. That said, open source advocate Simon Phipps has said: “I’m hearing alarmed tech folk but no particularly alarmed lawyers. Consensus seems to be that training the model and using the model are to be analysed separately, that training is Just Fine and that using is unlikely to involve a copyright controlled act so licensing is moot.”
OpenAI has a paper [PDF] on the matter which argued that “under current law, training AI systems constitutes fair use,” although it added that “Legal uncertainty on the copyright implications of training AI systems imposes substantial costs on AI developers and so should be authoritatively resolved.”
Does it work?
Another issue is whether the code will work correctly. Developer Colin Eberhardt has been trying the preview and said “I’m stunned by its capabilities. It has genuinely made me say “wow” out loud a few times in the past few hours.”
Read on though, and it seems that his results have been mixed. One common way to use Copilot is to type a comment, following which the AI may suggest a block of code. Eberhardt typed:
//compute the moving average of an array for a given window size
and Copilot generated a correct function. However, when he tried:
//find the two entries that sum to 2020 and then multiply the two numbers together
the generated code looked plausible, but it was wrong.
Careful examination of the code, combined with strong unit test coverage, should defend against this kind of problem; but it does look like a trap for the unwary, especially coming from GitHub as an official add-on for the world’s most popular code editor, Visual Studio Code.
One could contrast with copy-pasting code from a site like StackOverflow, where other contributors will often spot coding errors and there is a kind of community quality control. With Copilot, the developer is on their own.
“I think Copilot has a little way to go before I’d want to keep it turned on by default,” concluded Eberhardt, because of “the cognitive load associated with verifying its suggestions.”
He also observed that suggestions were sometimes slow to appear, though this may be addressed by some sort of busy indicator. Eberhardt nevertheless said he believes that many enterprises will subscribe to Copilot because of its “wow factor” – a disturbing conclusion given its current shortcomings, though bear in mind that it is a preview.
Much of programming is drudge work and few problems are unique to one project, so in principle applying AI to the task could work well. Microsoft’s IntelliCode, which uses machine learning to improve code completion, is fine; it can improve productivity without increasing the risk of errors.
AI-generated chunks of code is another matter and the story so far is that there is plenty of potential, but also plenty of snags. ®