Andersen v. Stability AI: The Theory That Diffusion Models Can 'Contain' the Works They Trained On

A federal court let visual artists' direct and induced copyright claims against AI image generators proceed on the theory that protected works may persist inside the model itself — a holding that reframes how courts think about training data.

When Judge William H. Orrick ruled on the motions to dismiss in Andersen v. Stability AI Ltd., No. 3:23-cv-00201 (WHO) (N.D. Cal. Aug. 12, 2024), he did something that most early generative-AI copyright decisions had avoided: he took seriously the possibility that a trained diffusion model is not merely a statistical abstraction but may itself contain the copyrighted works it learned from. That move — letting the artists’ direct-infringement theory survive a second time — is why this case, brought by illustrators Sarah Andersen, Kelly McKernan, Karla Ortiz, and others against Stability AI, Midjourney, DeviantArt, and Runway AI, remains one of the most consequential AI training suits in the country. For Los Angeles’s enormous community of working illustrators and concept artists, it is the case to watch.

At a glance

Case: Andersen v. Stability AI Ltd., No. 3:23-cv-00201 (WHO) (N.D. Cal.)
Order: August 12, 2024, on defendants’ motions to dismiss the First Amended Complaint, Judge William H. Orrick
Survived: direct copyright infringement; induced infringement (against Stability); Lanham Act trade-dress/trademark theory; unjust enrichment
Dismissed: DMCA § 1202(a) (with prejudice); DMCA § 1202(b); right of publicity; breach of contract
Status: Pending — case has proceeded into discovery; the plaintiffs have continued to amend, and a jury trial has been scheduled

The “model contains the work” theory

The doctrinal heart of the order is its treatment of direct infringement. Defendants argued that Stable Diffusion stores no images — that training extracts only parameters, weights, and statistical relationships, not pixels — and that a model therefore cannot infringe the reproduction right simply by existing. Judge Orrick declined to resolve that factual dispute on a motion to dismiss. Crediting the plaintiffs’ allegations as true, he found it plausible that the protected works are “contained” in Stable Diffusion as compressed copies or as “algorithmic or mathematical representations,” and that the model could not function as it does without having copied them during training.

That framing matters because it shifts the contested question from a clean legal one (does training infringe?) to a messy factual one (what, exactly, is inside the model?). It converts a motion-to-dismiss argument into a discovery and expert-testimony problem. If the artists can show that the weights effectively encode reproductions of their images — even lossy, distributed ones — the reproduction right is implicated regardless of whether any output ever resembles a particular drawing. The decision thus keeps alive a theory of liability that does not depend on output similarity at all.

Induced infringement and the design of the tool

The order also let an inducement theory proceed against Stability. Judge Orrick distinguished the classic staple-article-of-commerce line of cases — the VCR and peer-to-peer precedents in which a product had substantial non-infringing uses — by emphasizing the plaintiffs’ allegation that Stable Diffusion was built to produce images in the style of, and derived from, the training works, and that end-user generation of infringing images is a foreseeable result of that design. At the pleading stage, that was enough.

This is a meaningful contrast with how secondary-liability arguments have fared elsewhere. Inducement requires culpable intent, and courts are usually skeptical of inferring it from the mere capacity of a general-purpose tool to be misused. By letting the claim survive against the model’s developer specifically, the court signaled that the purpose for which a model was trained — and the marketing and design choices around it — can supply the intent that mere capability cannot.

What fell away: DMCA, publicity, and contract

The dismissals are as instructive as the survivals. The DMCA § 1202(a) claim — alleging that defendants provided false copyright-management information — was dismissed with prejudice for want of the statute’s demanding “double scienter,” and because nothing in the licensing materials suggested an association with the plaintiffs’ works. The § 1202(b) claim, premised on the removal or alteration of CMI during training, failed for the now-familiar reason that the plaintiffs could not allege that any model output was identical to their works — the same “identicality” obstacle that has defeated CMI theories in the GitHub Copilot litigation and elsewhere.

The right-of-publicity and breach-of-contract claims were also dismissed. Together, the dismissals narrow the case to its strongest core: copyright in the training data and in the model’s operation, plus a trademark/trade-dress theory and unjust enrichment. The artists lost the peripheral theories but kept the two that could actually establish infringement liability.

Open questions

The order resolves almost nothing on the merits; it defines the battlefield. The central unresolved question is factual and technical: do the model weights, in any legally cognizable sense, contain copies of the training images, or only learned abstractions about them? Expert discovery will drive that answer, and it may differ model to model. A second question is how the surviving claims will interact with the fair-use defense, which was not before the court on the motion to dismiss and which has produced sharply divergent results in the parallel text cases. Third is the scope of inducement liability: whether intent inferred from training purpose survives summary judgment, or whether it requires more particularized proof of how outputs are used. Finally, the “identicality” requirement for DMCA claims remains contested and is now before the Ninth Circuit in the Copilot appeal; a reversal there could revive CMI theories that this order foreclosed.

Implications

For artists and rightsholders: The “model contains the work” theory offers a path to infringement liability that does not require proving output similarity — but it depends entirely on technical proof developed in discovery.
For AI developers: Training purpose and product design are now litigable evidence of intent. How a model is described, marketed, and fine-tuned can convert a general-purpose tool into an inducement target.
For litigators: CMI/DMCA claims remain fragile absent identical outputs; plead them, but do not build the case on them. The durable claims are direct infringement and inducement.
For the broader docket: Because the order turns on contested facts rather than clean legal rulings, it ensures that the training-data question reaches a jury or summary judgment rather than being decided at the pleadings — a posture that magnifies the stakes of expert testimony.

Frequently asked questions

Did the court decide that training AI on copyrighted art is infringement? No. It decided only that the artists’ allegations were plausible enough to proceed past a motion to dismiss. Whether training actually infringes — and whether fair use excuses it — remains to be litigated.

Why does the “model contains the work” idea matter? Because it offers a theory of liability that does not depend on any output resembling a specific artwork. If protected images are encoded in the model’s weights, the reproduction right may be implicated by the model’s very existence — a question now headed for technical discovery.

Is this the same as the Anthropic and Meta book cases? No. Those are separate suits over text training with their own fair-use rulings. Andersen concerns image generators and a different theory about what the trained model contains, and fair use has not yet been decided in it.

Authorities and sources

Andersen v. Stability AI Ltd., No. 3:23-cv-00201 (WHO) (N.D. Cal.): docket (CourtListener); August 12, 2024 order (Justia).
Analysis: Copyright Alliance, “Takeaways from the Andersen v. Stability AI Copyright Case”; BakerHostetler case page; Knowing Machines legal explainer.