Copyright

Bartz v. Anthropic: Transformative Training, Unforgivable Acquisition

Judge Alsup held that training a large language model on books is 'exceedingly transformative' fair use — while refusing to extend that blessing to the pirated library that fed it. The $1.5 billion settlement that followed shows where the real exposure lies.

Books beside a digital interface, representing a model trained on text
An AI system trained on books raised the question of how the underlying corpus was acquired. Shutterstock
Educational content, not legal advice. This article explains general legal concepts. It does not create an attorney–client relationship. For your specific situation, consult a licensed attorney.

The first generation of generative-AI copyright decisions has arrived not as a single doctrinal pronouncement but as a set of carefully partitioned holdings, and Bartz v. Anthropic PBC, No. 3:24-cv-05417-WHA (N.D. Cal. June 23, 2025), is the most instructive of them. Senior Judge William Alsup did not answer the binary question the parties pressed on him — is training fair use, yes or no — but instead disaggregated the defendant’s conduct into three distinct acts and subjected each to its own analysis under 17 U.S.C. § 107. The result is a decision that hands the AI industry its most important win to date on the core training question while simultaneously exposing the practice that has proven most expensive: the acquisition of the underlying corpus.

At a glance

  • Case: Bartz v. Anthropic PBC, No. 3:24-cv-05417-WHA (N.D. Cal.)
  • Decided: June 23, 2025 (order on partial summary judgment), Senior Judge William H. Alsup
  • Holding: Training Claude on lawfully acquired books is fair use; destructive scanning of purchased print copies is fair use; downloading and retaining millions of pirated books is not fair use
  • Status: Fair-use ruling stands; the piracy claim resolved through a reported $1.5 billion settlement, with final approval and distribution pending as of mid-2026

The doctrinal frame: § 107 and transformative use

Fair use is an affirmative defense, evaluated through the four non-exclusive factors codified at 17 U.S.C. § 107: the purpose and character of the use (including whether it is commercial and whether it is “transformative”), the nature of the copyrighted work, the amount and substantiality of what was taken, and the effect on the potential market for the work. Since Campbell v. Acuff-Rose Music the first factor has carried the conceptual weight of the inquiry, and the Supreme Court’s recent decision in Andy Warhol Foundation v. Goldsmith recalibrated it toward a comparison of purposes: does the secondary use merely supersede the original, or does it serve a sufficiently different end?

What makes Bartz methodologically important is that Judge Alsup refused to run a single fair-use analysis across the defendant’s entire course of conduct. He recognized that “using books to build an AI” is not one act but several, each with a different relationship to the copyrighted works — and that § 107 must be applied to each.

Three acts, three answers

Training. The use of lawfully acquired books to train Anthropic’s Claude models was, in the court’s words, “exceedingly transformative.” The animating analogy is that a model ingests text in order to learn the statistical structure of language and produce new output, much as a human reads in order to learn to write. The transformative character of that use under the first factor outweighed the creative nature of the works under the second — the books were expressive, core-copyright material, but that did not rescue the plaintiffs once the purpose of the use was found to diverge so sharply from the purpose of the originals.

Format conversion. Anthropic had also destructively scanned print books it lawfully purchased, converting them into a digital, searchable internal format. The court held this, too, was fair use: a change of medium for copies the defendant already owned, with no new copies distributed to the public, is the kind of internal format-shifting that the doctrine tolerates.

Acquisition by piracy. The third act broke the other way, and decisively. Anthropic had downloaded and retained more than seven million pirated books from shadow libraries such as LibGen to assemble a permanent central library. That wholesale copying from infringing sources was not excused by the transformative ends to which the copies were eventually put. The court’s formulation — that a defendant cannot retroactively “bless” its own infringement by later acquiring or purchasing legitimate copies — is the line that matters for compliance: the legality of the use does not cure the illegality of the acquisition. That claim was left for trial.

Why the partition matters

The analytical move is more consequential than the headline. By refusing to treat “AI training” as a monolith, Judge Alsup decoupled the fair-use question from the provenance question. A developer may possess an entirely defensible transformative-use theory and still face crippling liability for how it sourced its data. That reorients the litigation risk: the contested issue in the next wave of cases is less likely to be the abstract permissibility of training than the discoverable facts of corpus assembly — what was downloaded, from where, and whether the developer ever held a lawful copy of each work.

It also clarifies what Bartz does not decide. This was a summary-judgment ruling on a developed record before a single district judge; it binds no court outside the case. Its persuasive force on the training question is genuine, but it travels alongside a piracy holding that should temper any reading of the decision as a clean victory for AI developers. The opinion is better understood as a map of where risk lives than as a safe harbor.

The settlement as epilogue — and the lesson in the number

The piracy claim was set for trial in December 2025. It never reached a jury. The parties settled for $1.5 billion — reported as the largest copyright settlement in United States history, covering roughly 500,000 works. The district court granted preliminary approval on September 25, 2025, and administration passed to Judge Araceli Martínez-Olguín; at a fairness hearing on May 14, 2026, the court declined to grant immediate final approval, and distribution remained pending, with approximately 120,000 claims filed as of mid-2026.

The number is the lesson. A transformative-use holding of the first order did not insulate the defendant from ten-figure exposure, because the liability lived in the acquisition, not the use. The economics are worth stating plainly: a defensible model-training program can coexist with catastrophic copyright liability if the training data was assembled from infringing sources.

Open questions

Several issues Bartz leaves unresolved will shape the doctrine. First, the “transformative training” holding rests on a record about how these models ingest and abstract text; a different record — for instance, one showing memorization and verbatim regurgitation of training works in output — could alter the first- and fourth-factor analysis. Second, the decision says little about the fourth-factor market-harm theory that Kadrey v. Meta would soon foreground. Third, the piracy holding raises but does not resolve how a developer “launders” a tainted corpus: whether subsequent licensing or purchase can mitigate, rather than cure, liability remains open.

Implications for businesses and developers

  • Data provenance is the principal control. Maintain auditable records establishing lawful acquisition of every work in a training corpus. The transformative nature of training will not excuse infringing sourcing.
  • Distinguish acquisition from use in compliance design. Internal format-shifting of lawfully owned copies sits on firmer ground than ingestion of materials obtained from unauthorized repositories.
  • Treat shadow-library sourcing as a strict-liability risk. Bartz indicates that downloading from pirate sources is independently actionable regardless of downstream purpose.

Frequently asked questions

Did the court rule that AI training is legal? It held that, on this record, training Claude on lawfully acquired books was fair use. That is a single district court’s ruling, not a categorical or binding rule, and it was paired with a holding that pirating the training data is not protected.

Why did Anthropic pay $1.5 billion if it won on fair use? Because it won only on the use of lawfully acquired works. The separate claim — that it had downloaded and retained millions of pirated books — was not protected by fair use and was resolved by settlement before trial.

What is the practical takeaway for companies building AI? How you obtain training data may matter more than how you use it. Lawful, documented sourcing is the central compliance obligation the decision identifies.

Authorities and sources