Authors Sue OpenAI for Copyright Infringement

Adam Philipp
July 10, 2023

OpenAI sued
For copyright infringement
By famous authors

Additional Topics

A class action lawsuit was recently filed in San Francisco federal court, alleging that OpenAI’s ChatGPT tool “relied on harvesting mass quantities” of copyright-protected works without permission.

The plaintiffs are led by authors Paul Tremblay and Mona Awad. Tremblay wrote the novel The Cabin at the End of the World, which was adapted by M. Night Shyamalan into the movie Knock at the Cabin. Awad is a Canadian novelist and short-story writer known for darkly comic fiction such as Bunny, named a Best Book of 2019 by TIME, Vogue, and the New York Public Library.

The plaintiffs claim that the defendant company infringed the writers’ work when it illegally downloaded their novels to train ChatGPT to mimic human writing. The suit also alleged that ChatGPT’s answers to queries constitute infringement of the authors’ intellectual property rights.

The complaint alleges causes of action for direct copyright infringement, vicarious copyright infringement, violations of the Digital Millennium Copyright Act (DMCA), unjust enrichment, and negligence, among other claims.

The complaint explains that

ChatGPT allows users to enter text prompts, which ChatGPT then attempts to respond to in a natural way, i.e., ChatGPT can generate answers in a coherent and fluent way that closely mimics human language. If a user prompts ChatGPT with a question, ChatGPT will answer. If a user prompts ChatGPT with a command, ChatGPT will obey. If a user prompts ChatGPT to summarize a copyrighted book, it will do so.

The authors claim that ChatGPT can generate summaries of their novels when given the appropriate prompt. They say this can only be possible if the AI tool was trained on their copyrighted works.

Although AI tools that generate writing can be trained on any form of writing (including this blog, potentially), according to the complaint “Books … have always been a key ingredient in training datasets for large language models because books offer the best examples of high-quality long-form writing.”

In 2018, OpenAi revealed that it had “fed” ChatGPT a collection of more than 7,000 novels on BookCorpus, a collection assembled by AI researchers.

According to the complaint,

They copied the books from a website called Smashwords.com that hosts unpublished novels that are available to readers at no cost… Those novels, however, are largely under copyright. They were copied into the BookCorpus dataset without consent, credit, or compensation to the authors.

OpenAI admits that some of its training data came from online collections of hundreds of thousands of books.

According to the plaintiffs, these collections are “notorious shadow library websites,” like Library Genesis, Z-Library, Sci-Hub and Bibliotik, that upload copyrighted material without permission.

As The Hollywood Reporter notes,

In a May hearing before the House Judiciary Subcommittee on Courts, Intellectual Property and the Internet examining the intersection of AI and copyright law, key players in Hollywood argued in favor of legislation to bar the rampant, unpermitted collection of their works to train AI systems.

The potential use of AI to replace human writers and actors has been an issue in the contract negotiations between Hollywood creative guilds and the AMPTP, which represents studios, networks, and streamers. The refusal of the AMPTP to substantively address the AI issue was one of the grounds for the ongoing strike by the Writers Guild of America (WGA).

As The Hollywood Reporter notes, OpenAI is also facing a proposed class action “claiming the billions of lines of computer code that its AI technology analyzes to generate its own code qualify as copyright infringement.” Just like novels, software code can be protected by copyright law.

We’ve previously reported on other issues involving AI and intellectual property.

For example, in this blog we wrote about how Getty Images commenced legal proceedings in the High Court of Justice in London against Stability AI, claiming Stability infringed its IP rights, including copyright, in content owned or represented by Getty Images by using copyrighted images to “train” AI engines to generate “new” content.

Just like the haiku above, we like to keep our posts short and sweet. Hopefully, you found this bite-sized information helpful. If you would like more information, please do not hesitate to contact us here.

Let's work together.

Stay Informed

Sign up to receive Patent Poetry—a monthly roundup of key IP issues in our signature haiku format. Four articles (only 68 syllables); zero hassle.

PROTECT

DEAL

DEFEND

HIGHTECHNOLOGY

MECHANICAL& PRODUCTS

LIFE SCIENCES& CHEMISTRY

BRANDING& CREATIVE