Is It Legal to Train AI on Your Work? (And Can You Stop It?)

Is AI training on copyrighted work legal? A plain-English guide to where the lawsuits stand, why fair use is unsettled, and how creators can opt out today.

Streams of text and images flowing into AI model training servers
Whether feeding copyrighted books, art, and articles into an AI model is legal is one of the most contested questions in copyright law right now — and the courts have not agreed. Shutterstock
Educational guide, not legal advice. This article explains general legal concepts and is not a substitute for advice from an attorney licensed in your jurisdiction. Reading it does not create an attorney–client relationship.

Quick answer: Whether it is legal to train AI on copyrighted work is genuinely unsettled — there is no single national rule yet. In 2025, federal judges in California found that training AI on books could be fair use (in Bartz v. Anthropic and Kadrey v. Meta), but both decisions were narrow, and the Anthropic case still ended in a record $1.5 billion settlement over pirated copies. Another judge rejected fair use entirely in Thomson Reuters v. Ross. The big New York Times v. OpenAI case and the Andersen v. Stability AI artists’ case are still headed toward trial. In the meantime, you can take steps to opt out — robots.txt rules, the TDM reservation signal, Content-Signal, ai.txt, “Do Not Train” metadata, and platform settings — even though none of them is a guaranteed legal off-switch in the US. This is general education, not legal advice.

If you write, draw, photograph, code, or record for a living, you have probably asked the question by now: did an AI company scrape my work to build its product — and was that even allowed? It is one of the most contested questions in copyright law today, and anyone who tells you there is a clean yes-or-no answer is overstating it. Here is where things actually stand, in plain English, and what you can realistically do about it now.

Training a modern AI model almost always involves copying. To learn from a book, an image, or an article, the system first has to ingest a copy of it — and copying a protected work is one of the exclusive rights the Copyright Act reserves to the owner. So at the first step, AI training looks a lot like infringement.

The AI companies’ main defense is fair use — the doctrine (codified at 17 U.S.C. § 107) that allows limited unlicensed use of protected works after weighing four factors: the purpose and character of the use (including whether it is “transformative”), the nature of the original, how much was taken, and the effect on the market for the original. AI developers argue that training is highly transformative: the model is not republishing your novel, it is learning statistical patterns of language. Creators argue the opposite — that the output competes with them, and that ingesting pirated copies was never fair to begin with.

Fair use is famously fact-specific. It is decided case by case, which is exactly why this area is so unsettled: the same statute can produce different outcomes depending on what was copied, how it was obtained, and what the AI does with it. For the bigger picture of how copyright, training, and AI outputs fit together, see the AI and intellectual property pillar.

Where the lawsuits stand

There is no Supreme Court ruling and no statute squarely on point, so the law is being built one trial-court decision at a time — and those decisions do not all agree.

  • Bartz v. Anthropic (N.D. Cal., June 2025). Judge William Alsup held that using lawfully acquired books to train Anthropic’s models was “quintessentially transformative” and fair use — but that downloading and keeping a permanent “central library” of pirated books was not fair use. After a class was certified, Anthropic agreed in September 2025 to pay at least $1.5 billion, the largest copyright settlement in US history. The headline lesson: how the training data was obtained matters enormously.
  • Kadrey v. Meta (N.D. Cal., June 2025). Judge Vince Chhabria ruled for Meta on fair use — but pointedly framed it as a win on this record, where the authors had not proven market harm, rather than a blanket green light for AI training.
  • Thomson Reuters v. Ross Intelligence (D. Del., Feb. 2025). Judge Stephanos Bibas went the other way, rejecting fair use for a legal-research AI that trained on Westlaw headnotes. That case is now on appeal at the Third Circuit — the first appellate court to take up AI training and fair use.
  • The New York Times v. OpenAI (S.D.N.Y.). Still ongoing and closely watched. The Times says ChatGPT can reproduce its articles nearly verbatim, which puts the output (not just the training) in play. In 2026 the court ordered OpenAI to hand over a large sample of user logs in discovery.
  • Andersen v. Stability AI (N.D. Cal.). The visual artists’ case survived motions to dismiss and is heading toward a trial scheduled for September 2026.
  • Getty Images v. Stability AI (UK). A separate system entirely: in November 2025 the English High Court rejected Getty’s secondary copyright claim (the model was trained abroad) but found limited trademark infringement where outputs reproduced Getty’s watermark. Useful as a signal, but it is UK law, not US law.

The honest takeaway: AI developers have notched real wins on fair use, but every one of those rulings is narrow, several cut the other way, the appeals have barely started, and the biggest cases are unresolved. Treat anyone’s “AI training is legal/illegal — settled” claim with skepticism. You can read fuller breakdowns of these decisions in the AI and copyright case analyses.

What creators can do now

You do not have to wait for the courts. While none of the following is a guaranteed legal off-switch in the US, they are the practical tools available today — and they create a documented record that you reserved your rights.

Machine-readable opt-out signals. These tell crawlers, in a format software can read, that your content is off-limits for AI training:

  • robots.txt — add Disallow rules for known AI crawlers (for example GPTBot, Google-Extended, CCBot, ClaudeBot, and others). It is a voluntary convention, but most major AI companies now publish the bot names they honor.
  • TDM Reservation Protocol (TDMRep) — a W3C standard that sends a TDM-Reservation: 1 signal (via robots.txt, an HTTP header, or page metadata). This one has real teeth in the EU, where the AI Act and Copyright Directive recognize machine-readable rights reservations.
  • Content Signals — Cloudflare’s vocabulary, written into your robots.txt, that lets you separately allow or disallow search indexing, AI input, and AI training.
  • ai.txt — a proposed standard (from Spawning) aimed at the EU’s opt-out requirements, with more granular control than robots.txt.
  • “Do Not Train” metadata and registries — IPTC photo metadata fields and opt-out registries such as Spawning’s “Do Not Train” let you flag individual files.

Platform settings. If your work lives on Adobe, DeviantArt, Tumblr, LinkedIn, X, Meta’s platforms, or similar services, dig into the account/privacy settings — many now offer an AI training opt-out toggle. Read the terms of service, too: some platforms claim a license to use your uploads for AI.

Licensing. The flip side of “stop them” is “charge them.” A growing number of publishers, image libraries, and individual creators now license their archives to AI companies on negotiated terms. If your catalog has value, licensing may beat litigation.

Register and document. Timely copyright registration strengthens your hand if you ever do sue (it can unlock statutory damages and attorney’s fees), and keeping records of your opt-out signals shows you reserved your rights. For broader defensive steps, see how to protect your content from theft.

A caution: machine-readable signals only work going forward. They cannot un-train a model that already ingested your work, and not every crawler obeys them.

What businesses building with AI should know

If you are on the other side — building a product on top of AI, or fine-tuning a model on a dataset — the same uncertainty cuts toward caution.

  • Know where your training data came from. Bartz makes the source of the data a central issue: lawfully acquired works fared far better than pirated ones. “We scraped it off the internet” is not a comfortable place to be.
  • Watch the outputs, not just the inputs. The New York Times case is largely about a model reproducing protected text. If your system can regurgitate someone’s work, that is a separate exposure from how it was trained. This also ties into a question many builders get wrong — who actually owns AI-generated output.
  • Respect opt-out signals. Honoring robots.txt, TDM reservations, and platform rules is both good practice and increasingly close to a legal expectation in the EU.
  • Read your vendor’s terms and indemnities. If you build on a third-party model, find out what that provider promises about training data and whether it indemnifies you against infringement claims.
  • Consider licensing the high-risk material. For data that clearly belongs to identifiable owners, a license is cheaper than a lawsuit.

The open questions

Several big issues are still genuinely undecided, and the answers will reshape everything above:

  • Is AI training fair use, in general? The trial courts split, and the appellate courts have only just begun. The Third Circuit’s review of Thomson Reuters v. Ross and the eventual appeals in the California cases will matter far more than any single district ruling.
  • Does it depend on the type of AI? Courts have treated a legal-research tool, large language models, and image generators differently. The output’s market effect on the original may be the deciding factor.
  • Do opt-out signals create enforceable rights in the US? In the EU the trend is yes; in the US it is largely voluntary so far. Whether ignoring a clear, machine-readable reservation becomes legally significant here is unresolved.
  • Will Congress or the Copyright Office step in? Licensing markets, a statutory opt-out, or new disclosure rules could all change the landscape regardless of how the lawsuits come out.

The bottom line

Right now, “Is it legal to train AI on your work?” has no clean answer — it depends on how the work was obtained, what the AI does, and which court is asking. AI developers have won some important fair-use rulings, but those wins are narrow, others went the other way, the appeals are just starting, and the largest cases are still unresolved. You cannot count on a single ruling settling it. What you can do today is reserve your rights with machine-readable opt-out signals, lock down your platform settings, consider licensing, register your work, and keep records — while builders should know exactly where their data came from and watch what their models output.

This article is general educational information about intellectual property law and the current state of AI copyright litigation. It is not legal advice, it does not create an attorney-client relationship, and the law in this area is changing quickly. For guidance on your specific situation, consult an attorney licensed in your jurisdiction.

Frequently asked questions

Is it legal to train AI on copyrighted work?

It is unsettled. In 2025, two California federal judges ruled in separate cases (Bartz v. Anthropic and Kadrey v. Meta) that training AI on books could qualify as fair use — but both rulings were narrow and fact-specific, and the Bartz judge held that downloading pirated copies to build a permanent library was not fair use, which drove a $1.5 billion settlement. Meanwhile, in Thomson Reuters v. Ross, a Delaware judge rejected fair use for a different kind of AI tool. So there is no single national answer yet. The legality depends on how the works were obtained, what kind of AI it is, and which court is deciding. For your specific situation, ask an attorney licensed in your jurisdiction.

Can I stop AI companies from training on my content?

There is no guaranteed legal switch, but you can send machine-readable signals that an increasing number of AI crawlers respect. The most common are a robots.txt file that disallows known AI bots, the W3C TDM Reservation Protocol (a 'TDM-Reservation: 1' signal), Cloudflare's Content Signals vocabulary, the proposed ai.txt standard, and 'Do Not Train' metadata or registries like Spawning's. On platforms (Adobe, DeviantArt, Tumblr, social networks), check your account settings for an AI opt-out toggle. None of these is foolproof, and compliance is largely voluntary in the US, but they create a documented record that you reserved your rights.

Does a robots.txt or 'Do Not Train' tag actually have legal force?

In the United States it mostly does not — robots.txt is a voluntary convention, not a statute, and most US courts have not held that ignoring it is itself illegal. In the EU it is different: the EU AI Act and Copyright Directive recognize machine-readable rights reservations, and a December 2025 German court ruling found that a plain-language opt-out buried in terms of use was not enough — the reservation had to be machine-readable. So these signals carry the most weight for AI providers operating in or selling to the EU. They are still worth using everywhere because they document your intent. Confirm how this applies to you with an attorney licensed in your jurisdiction.

Lidiia Levitska
About the Author

Lidiia Levitska

International Intellectual Property Attorney

Lidiia Levitska focuses on intellectual property dispute resolution, policy, and advisory work across international institutions and government bodies. From 2021 to 2025 she served at the World Intellectual Property Organization (WIPO), managing arbitration cases and overseeing compliance with the Uniform Domain-Name Dispute-Resolution Policy (UDRP), and earlier led IP policy research as a Senior Policy Officer at the American Chamber of Commerce in Ukraine. She holds an LL.M. in International Intellectual Property Law from Chicago-Kent College of Law and an M.A. in Information Technology Law from the University of Tartu, and was admitted to the Ukrainian Bar in 2019.

More about Lidiia →