CALL US: 206.533.3854
CALL US  206.533.3854

CALL US: 206.533.3854

SECTORS

HIGH
TECHNOLOGY

Artificial Intelligence

Blockchain & Cryptocurrency

Computer Technology & Software

Consumer Electronics

Electrical Devices

MECHANICAL
& PRODUCTS​

Cleantech

Mechanical Devices

Consumer & Retail Products

Hardware & Tools

Toys & Games

LIFE SCIENCES
& CHEMISTRY​

Biotechnology

Chemical Compounds

Digital Health

Healthcare Products

Pharmaceuticals

BRANDING
& CREATIVE​

Books & Publications

Brand Creation

Luxury Products

Photography & Video

Product Design

"Used books store @ Monastiraki,Athens" by linmtheu is licensed under CC BY-SA 2.0.

Courts Weigh in on Whether AI Can Use Copyrighted Material for Training

Federal courts split
On whether AI training
Is seen as “fair use”

A US District Court in the Northern District of California has issued a summary judgement order in the case of Bartz v. Anthropic – the first major decision on whether copyrighted material may be used without a license to train generative artificial intelligence (GenAI) tools.

Days later, another court in the same district reached the opposite conclusion in the case of Kadrey v. Meta Platforms, Inc.

As the court explained in Bartz,

An artificial intelligence firm downloaded for free millions of copyrighted books in digital form from pirate sites on the internet. The firm also purchased copyrighted books (some overlapping with those acquired from the pirate sites), tore off the bindings, scanned every page, and stored them in digitized, searchable files. All the foregoing was done to amass a central library of “all the books in the world” to retain “forever.” From this central library, the AI firm selected various sets and subsets of digitized books to train various large language models under development to power its AI services. Some of these books were written by plaintiff authors, who now sue for copyright infringement. On summary judgment, the issue is the extent to which any of the uses of the works in question qualify as “fair uses” under Section 107 of the Copyright Act.

The defendant in the case was Anthropic PBC, an AI software firm founded by former OpenAI employees in January 2021. Its AI software is called Claude, first released publicly in March 2023.

Plaintiffs Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson are authors of books that Anthropic copied from pirated and purchased sources.

The court noted that in early 2021 an Anthropic cofounder downloaded Books3, an online library of 196,640 books that he knew had been assembled from unauthorized copies of copyrighted books — “that is, pirated,” said the court.

The co-founder eventually downloaded at least five million copies of books from Library Genesis, or LibGen, that he knew had been pirated.

In July 2022, said the court, Anthropic downloaded at least two million copies of books from the Pirate Library Mirror, or PiLiMi, which the company knew had been pirated.

Later, Anthropic spent millions of dollars to purchase millions of print books, often used.  Its vendors stripped the books from their bindings, cut their pages to size, and scanned the books into digital form.

The collections of book contents were used to train the Claude AI.

In its motion for summary judgement, Anthropic argued that

pirating initial copies of Authors’ books and millions of other books was justified because all those copies were at least reasonably necessary for training LLMs.

And yet, said the court, “Anthropic has resisted putting into the record what copies or even sets of copies were in fact used for training LLMs.”

The court concluded that

the use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use under Section 107 of the Copyright Act. And, the digitization of the books purchased in print form by Anthropic was also a fair use but not for the same reason as applies to the training copies. Instead, it was a fair use because all Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library — without adding new copies, creating new works, or redistributing existing copies. However, Anthropic had no entitlement to use pirated copies for its central library. Creating a permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy.

In the Kadrey case, as the court explained,

thirteen authors—mostly famous fiction writers—have sued Meta for downloading their books from online “shadow libraries” and using the books to train Meta’s generative AI models (specifically, its large language models, called Llama).

The court noted that

the doctrine of “fair use,” which provides a defense to certain claims of copyright infringement, typically doesn’t apply to copying that will significantly diminish the ability of copyright holders to make money from their works (thus significantly diminishing the incentive to create in the future). Generative AI has the potential to flood the market with endless amounts of images, songs, articles, books, and more. People can prompt generative AI models to produce these outputs using a tiny fraction of the time and creativity that would otherwise be required. So by training generative AI models with copyrighted works, companies are creating something that often will dramatically undermine the market for those works, and thus dramatically undermine the incentive for human beings to create things the old-fashioned way.

According to the court,

copying the protected works, however transformative, involves the creation of a product with the ability to severely harm the market for the works being copied, and thus severely undermine the incentive for human beings to create. Under the fair use doctrine, harm to the market for the copyrighted work is more important than the purpose for which the copies are made.

The judge in the Kadrey cases noted that “Judge Alsup [in the Bartz case discussed above] focused heavily on the transformative nature of generative AI while brushing aside concerns about the harm it can inflict on the market for the works it gets trained on.”

The Court granted summary judgment to Meta on the plaintiffs’ claim that the company violated copyright law by training its models with their books.

However, said the court,

in the grand scheme of things, the consequences of this ruling are limited. This is not a class action, so the ruling only affects the rights of these thirteen authors—not the countless others whose works Meta used to train its models. And, as should now be clear, this ruling does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one.

There will likely be several other lower court decisions on these issues, followed by circuit court appeals and decisions, likely followed by a final resolution in the US Supreme Court some years in the future – unless Congress acts first to clarify the status of AI training under US copyright law.


Just like the haiku above, we like to keep our posts short and sweet. Hopefully, you found this bite-sized information helpful. If you would like more information, please do not hesitate to contact us here: https://aeonlaw.com/contact-us/.

Related Articles

USPTO Director Denies IPRs over “Settled Expectations”

IPR denied
For “settled expectations.”
How long is too long?
Read More

Court Reduces Damages in Trade Secret Case to Avoid Double Recovery

In trade secret case,
Owner must decide between
Money, injunction
Read More

Federal Circuit Orders New Damages Trial; Willing Licensee Data Lacking

Federal Circuit:
Evidence doesn’t support
“willing” licensee
Read More

Let's work together.

Contact us to set up a meeting with an attorney or team member.

Stay Informed

Sign up to receive Patent Poetry—a monthly roundup of key IP issues in our signature haiku format. Four articles (only 68 syllables); zero hassle.

SECTORS

HIGH
TECHNOLOGY

Artificial Intelligence

Blockchain & Cryptocurrency

Computer Technology & Software

Consumer Electronics

Electrical Devices

MECHANICAL
& PRODUCTS​

Cleantech

Mechanical Devices

Consumer & Retail Products

Hardware & Tools

Toys & Games

LIFE SCIENCES
& CHEMISTRY​

Biotechnology

Chemical Compounds

Digital Health

Healthcare Products

Pharmaceuticals

BRANDING
& CREATIVE​

Books & Publications

Brand Creation

Luxury Products

Photography & Video

Product Design

call us  206.533.3854