Could Your Code Write Itself? The Rise of A.I. Code Assistants

Madeleine B. Weber

IP Litigation

Technology Strategy & Analysis

July 28, 2023

With the rise of ChatGPT and other artificial intelligence assistants, many people have begun to wonder if their job may one day be taken over by A.I. As a computer engineer, I told myself that A.I. wouldn’t take my job because, well, someone has to make the A.I., right? Sure, they can write cover letters and book reports, but it’s not like they can write code… Except yes, they can actually do both. On the one hand, A.I. can write a sonnet for your girlfriend’s birthday that you definitely did not forget and, on the other, it can write you code for a new annual calendar alert.

OpenAI’s ChatGPT is a type of generative A.I. that is widely known for its conversational functionality and existential-crisis-causing answers, but if given a prompt, ChatGPT can generate code to solve a wide array of coding problems.[1] Similarly, GitHub Copilot is an A.I. that is specifically designed to help programmers code. Sold as an “A.I. pair programmer”, Copilot draws context from your code and comments as you type, generating lines of code and even entire functions without prompting.[2] Objectively, these A.I. tools are impressive, but they aren’t the perfect programmer… yet.

Example of a coding prompt given to ChatGPT. Source: https://www.pluralsight.com/blog/software-development/how-use-chatgpt-programming-coding

ChatGPT and Github Copilot both utilize unsupervised learning to process the large amount of data they are trained on. Unsupervised learning uses data that does not identify a specific output with each input, but instead trains the A.I. to identify the structure and semantics of the input data.[3] Github Copilot is trained on code in Github public repositories, and ChatGPT’s training data is sourced via web scraping or extracting data from websites using automated tools. Because ChatGPT and Github Copilot are trained on human-generated data, language, and code, they are at risk of reproducing the errors that are likely present in any human-generated source. For example, Copilot “…may suggest old or deprecated uses of libraries and languages” based on how frequently said libraries appear in the training set and currently, only about 26% of all Copilot suggestions are accepted/used by users.[2]

These shortcomings aside, Copilot can consistently generate accurate suggestions when provided with in-depth context and detailed prompting. For example, starting your program with a comment about your program’s general functionality goals can set up Copilot to provide relevant future suggestions by explicitly supplying the A.I. with context. Copilot also tends to perform better when users break down their program into consecutive steps, instead of prompting Copilot to generate a large chunk of code at once. Ultimately, the quality of prompts fed to A.I. assistants (a skill called prompt engineering) greatly impacts the accuracy and usefulness of an A.I. program’s suggestions.[4]

ChatGPT and Copilot also raise questions in the intellectual property space. According to the U.S. Copyright Office, “When an AI technology determines the expressive elements of its output, the generated material is not the product of human authorship. As a result, that material is not protected by copyright and must be disclaimed in a registration application.”[5] When it specifically comes to prompting, the Copyright Office has made it clear that prompting an A.I. program alone is not copyrightable, stating, “…prompts function more like instructions to a commissioned artist—they identify what the prompter wishes to have depicted, but the machine determines how those instructions are implemented in its output.”[5] The Copyright Office also seems to suggest that even works that have been created in conjunction with artificial intelligence (not just prompting) will have to be reviewed on a case-by-case basis, and even so, “…copyright will only protect the human-authored aspects of the work.”[5]

What about the material used to train generative artificial intelligence?

Presently, there is little in the way of preventing copyrighted code from being used to train A.I. programs, with A.I. companies claiming that their application of these training materials falls under fair use and is therefore non-infringing. In January 2023, several artists filed a class action lawsuit claiming that their art was used to train generative A.I. programs and, similarly, in February 2023, Getty Images filed a lawsuit against Stability A.I. for allegedly copying 12 million images without Getty’s permission. The outcomes of these cases could define the extent to which a fair use defense can be employed to defend A.I. training processes.[6]

So, if an A.I. tool can be trained on copyrighted material, what happens if it then reproduces that copyrighted code in some way? Is the generated code now vulnerable to infringement claims? If so, how can such infringement be proven?

Technically, “copyright owners may be able to show that such outputs infringe their copyrights if the AI program both (1) had access to their works and (2) created ‘substantially similar’ outputs.”[6] However, A.I. companies like OpenAI claim that, “…A.I. systems generally do not regenerate, in any nontrivial portion, unaltered data from any particular work in their training corpus.”[6] Fortunately, this question will also be answered by the courts as one of the lawsuits against Stability A.I. claims that every output of an A.I. program is essentially a composite of all training set inputs (infringed materials) and A.I. is therefore always producing infringing works. As well as issues with direct copying, A.I. that generates works based on an artist’s unique style raises concerns for artists who are rarely granted copyright protection for style alone under existing copyright laws, but who can’t compete with the extraordinary volume of output an A.I. program can produce.[6]

As we continue to follow these cases, one conclusion is clear: artificial intelligence is an exciting and state-of-the-art field that has the power to significantly change the way software creation and intellectual property litigation are performed moving forward.

[1] https://openai.com/blog/chatgpt
[2] https://www.zdnet.com/article/how-does-chatgpt-work/
[3] https://github.com/features/copilot
[4] https://github.blog/2023-06-20-how-to-write-better-prompts-for-github-copilot/
[5] https://www.federalregister.gov/documents/2023/03/16/2023-05321/copyright-registration-guidance-works-containing-material-generated-by-artificial-intelligence
[6] https://crsreports.congress.gov/product/pdf/LSB/LSB10922