Stripping the Byline: The § 1202(b) Ruling in The Intercept Media v. OpenAI and the SDNY CMI Split

Judge Rakoff let The Intercept's DMCA § 1202(b)(1) copyright-management-information claim against OpenAI survive dismissal while tossing the § 1202(b)(3) claim, splitting from Raw Story v. OpenAI on standing and reshaping CMI litigation in AI-training cases.

In The Intercept Media, Inc. v. OpenAI, Inc., No. 1:24-cv-01515 (S.D.N.Y.), Judge Jed S. Rakoff handed AI-training plaintiffs their first meaningful foothold on a theory that had been faltering across the Southern District of New York. After issuing a bottom-line order on November 21, 2024, the court released its full opinion on February 20, 2025, denying OpenAI’s motion to dismiss as to The Intercept’s claim under 17 U.S.C. § 1202(b)(1) — the removal of copyright-management information (CMI) — while dismissing the related § 1202(b)(3) distribution claim and dismissing Microsoft from the case entirely. The result matters less for who won than for what it exposed: two judges in the same courthouse, applying the same statute to nearly identical allegations about ChatGPT’s training data, reaching opposite conclusions on whether a publisher even has standing to complain.

At a glance

Case: The Intercept Media, Inc. v. OpenAI, Inc., et al., No. 1:24-cv-01515 (S.D.N.Y.); opinion entered Feb. 20, 2025 (bottom-line ruling Nov. 21, 2024), Rakoff, J.
Statute: DMCA § 1202(b) — the integrity provision protecting copyright-management information, distinct from the § 1201 anti-circumvention provisions.
Holding: The § 1202(b)(1) CMI-removal claim against OpenAI survives; the § 1202(b)(3) claim (distributing works knowing CMI was removed) is dismissed; Microsoft is dismissed in full.
Theory accepted: “Downstream infringement” — that removing author, title, and copyright notices from training data enables attribution-free, infringement-facilitating outputs.
Why it’s notable: It splits from Raw Story Media, Inc. v. OpenAI, Inc., No. 1:24-cv-01514 (S.D.N.Y. Nov. 7, 2024) (McMahon, J.), which dismissed a materially similar CMI claim for lack of Article III standing.

Section 1201 versus § 1202: two different DMCA machines

The two halves of Chapter 12 of the Copyright Act are routinely lumped together as “the DMCA,” but they protect different things and were litigated very differently here. Section 1201 is the anti-circumvention regime: it forbids bypassing technological protection measures that control access to, or copying of, copyrighted works, and it bars trafficking in circumvention tools. Section 1201 is the statute behind disputes over DRM, jailbreaking, region locks, and aftermarket diagnostic tools. The Intercept’s case has nothing to do with § 1201; OpenAI was not accused of defeating a technological lock.

Section 1202, by contrast, protects the information attached to a work — the CMI. Subsection (b)(1) prohibits intentionally removing or altering CMI; (b)(2) prohibits distributing CMI known to be false; and (b)(3) prohibits distributing or publicly performing works (or copies) knowing that CMI has been removed or altered. Crucially, each subsection carries a knowledge overlay: the defendant must act “knowing, or … having reasonable grounds to know, that it will induce, enable, facilitate, or conceal an infringement.” The Second Circuit has read this as a “double scienter” requirement — the defendant must both know the CMI was removed and know (or have reason to know) that the removal will facilitate infringement. That second mental-state element is what makes § 1202(b) hard to plead in the abstract and is precisely where The Intercept’s complaint was tested.

Why the (b)(1) claim lived and the (b)(3) claim died

Rakoff’s split disposition is the analytical heart of the opinion, and it tracks the statutory text closely. Section 1202(b)(1) is about the act of stripping CMI. The Intercept alleged that OpenAI built its training sets from news articles and intentionally removed the author bylines, titles, and copyright notices embedded in or accompanying those articles before feeding them to the model. The court found that allegation plausible and, importantly, found that The Intercept had adequately pleaded the double-scienter element: OpenAI allegedly knew the CMI had been removed and knew that a model trained on de-attributed text could generate outputs that reproduce protected expression without attribution, thereby enabling or concealing downstream infringement by ChatGPT users. The “downstream infringement” framing did the work — it supplied the causal link between the upstream removal and a foreseeable future infringement, satisfying the second scienter prong at the pleading stage.

Section 1202(b)(3) failed for a structurally different reason. That subsection reaches distribution of works or copies with CMI knowingly removed. The court was not persuaded that The Intercept had plausibly alleged the requisite distribution of identifiable copies stripped of CMI in the manner the subsection contemplates. The distinction is doctrinally significant: removal liability (b)(1) attaches to OpenAI’s internal data-engineering conduct, whereas distribution liability (b)(3) requires putting altered copies back into circulation. By letting (b)(1) proceed while dismissing (b)(3), Rakoff effectively located the cognizable wrong in the ingestion pipeline rather than in the model’s outputs.

The standing fault line, and the split with Raw Story

The most consequential aspect of the decision is jurisdictional, not substantive. Three months earlier, Judge Colleen McMahon dismissed Raw Story v. OpenAI on Article III grounds, applying TransUnion LLC v. Ramirez. Her reasoning: the bare removal of CMI, without any allegation that OpenAI disseminated the plaintiff’s articles in a form that injured the plaintiff, is “too abstract” to be a concrete injury. She was skeptical that ChatGPT — given its scale and the statistical nature of its outputs — would ever regurgitate a particular Raw Story article verbatim, and she dismissed without prejudice while signaling doubt that the defect could be cured.

Rakoff reached the opposite conclusion on standing. He treated the asserted harm as analogous to the property-based injuries traditionally actionable in copyright — the kind of interest with a long common-law pedigree, which under TransUnion supports concreteness. He also accepted that copyright-type injury does not necessarily require third-party publication, crediting the downstream-infringement theory as a cognizable harm. A factual difference helped: The Intercept pointed to evidence of ChatGPT producing verbatim or near-verbatim passages from its articles, giving the dissemination concern empirical bite that Raw Story lacked. Notably, Rakoff’s opinion did not engage Raw Story directly, leaving two coordinate SDNY decisions in unreconciled tension on the threshold question of who may sue for CMI removal in the training-data context.

Open questions

Does the property-based analogy survive TransUnion scrutiny? Rakoff’s concreteness theory rests on treating CMI removal as a traditional copyright-type harm. Whether the Second Circuit agrees that removal standing alone clears the historical-analogue test is unresolved and squarely teed up for appeal.
How much “verbatim output” is enough? The Intercept’s evidence of near-verbatim regurgitation may have been load-bearing for standing. Plaintiffs without comparable output evidence may land closer to Raw Story.
Where does the (b)(1)/(b)(3) line settle? If removal liability attaches to ingestion but distribution liability requires identifiable stripped copies, plaintiffs must plumb the data pipeline in discovery.
Does the double-scienter standard scale to foundation models? Inferring that a developer “knew” de-attribution would facilitate infringement is plausible at the pleading stage; proving it on a summary-judgment record across billions of documents is a different matter.

Implications

CMI claims are now a live, not vestigial, weapon in AI-training litigation. Intercept gives plaintiffs a template for surviving dismissal where prior CMI claims had been dying on standing.
Plead the outputs, not just the inputs. Concrete evidence that the model reproduces protected text without attribution materially strengthens both standing and the second scienter prong.
The forum and the judge matter enormously. An intra-district split means outcomes may turn on assignment; expect heavy briefing of Raw Story versus Intercept in every CMI case until the Second Circuit speaks.
Distinguish your DMCA theory. Litigants should be precise about whether they are invoking § 1201 (circumvention) or § 1202 (CMI), and within § 1202, which subsection — the elements and viable defendants differ.
Data-engineering practices are now discoverable. How developers parse, clean, and strip metadata from training corpora becomes a focal point of fact discovery.

Frequently asked questions

Does this ruling mean OpenAI violated the DMCA? No. A denial of a motion to dismiss only means the § 1202(b)(1) claim is plausible enough to proceed. The Intercept still must prove intentional CMI removal and the double-scienter knowledge element on a full evidentiary record.

What is the difference between this case and Raw Story? Both alleged OpenAI stripped CMI from news articles used to train ChatGPT. Judge McMahon dismissed Raw Story for lack of Article III standing absent dissemination of stripped copies; Judge Rakoff found The Intercept had standing and a viable § 1202(b)(1) claim, in part because it pointed to verbatim regurgitation. The two decisions conflict.

Is this a § 1201 anti-circumvention case? No. It is a § 1202 copyright-management-information case about removing author, title, and copyright notices. Section 1201, which bars bypassing technological protection measures, is not at issue.