Training Harm: Reinterpreting the Fourth Fair Use Factor in the Age of Generative AI Music Aanya Bansal

Generative artificial intelligence (AI) has begun to reshape the music industry in ways that are difficult to ignore. Today’s models can produce songs that replicate not only the structure of existing works but also the stylistic and expressive qualities associated with particular artists. These systems are not experimental tools operating at the margins. They are developed and deployed by commercial AI companies, such as OpenAI, Anthropic, and Google DeepMind, seeking to monetize generative outputs at scale. Unsurprisingly, copyright holders, especially record labels, have responded with litigation, arguing that the use of copyrighted sound recordings and lyrics in model training constitutes unlawful appropriation. At bottom, these disputes force courts to confront a familiar question in an unfamiliar context: how should the economic consequences of AI training be evaluated under the fair use doctrine?

Courts must interpret the fourth fair use factor, which says, “the effect of the use upon the potential market for or value of the copyrighted work,” with closer attention to the realities of AI training. [1] Large-scale ingestion of copyrighted music does not occur in a vacuum. It interacts directly with both existing licensing structures and markets that copyright holders are actively trying to develop. Yet the modern emphasis on “transformative use” risks obscuring these economic effects. When training is framed as mere abstraction—that is, as a process in which models extract statistical patterns from data without retaining or reproducing the underlying expressive content—courts may overlook the ways in which it displaces markets, substitutes for licensing, and ultimately feeds into the same revenue-generating ecosystem as the original works. A more grounded application of the fourth factor would account for three interrelated dynamics: the erosion of traditional licensing markets, the emergence of markets for training data, and the integration of AI-generated outputs into the streaming economy. Taken together, these considerations strongly weigh against fair use in cases involving commercial AI training on copyrighted music.

The Supreme Court has long treated the fourth fair use factor as a central component of the analysis. In Harper & Row Publishers, Inc. v. Nation Enterprises (1985), the Court rejected a fair use defense where a magazine published excerpts of President Ford’s memoir without authorization. [2] The key concern was not simply that copying had occurred. The defendant had interfered with the plaintiff’s ability to control the timing and terms of first publication. In doing so, the Court made clear that market harm extends beyond straightforward substitution. It also includes injury to “derivative” and “potential” markets, meaning those that copyright holders either currently exploit or would reasonably seek to develop. [3] Derivative markets encompass licensed adaptations and downstream uses, while potential markets capture future avenues of commercialization that are sufficiently foreseeable. Direct substitution, in this framework, occurs when a secondary use functions as a market replacement for the original work or its licensed derivatives. Campbell v. Acuff-Rose Music, Inc. (1994) did not alter this core principle. There, the Court considered whether 2 Live Crew’s parody of Roy Orbison’s “Oh, Pretty Woman” constituted fair use and held that a commercial parody could, in some circumstances, qualify as fair use. Even so, the Court did not minimize the importance of market harm. It reaffirmed that courts must ask whether the secondary use “usurps” a market that the copyright holder would ordinarily exploit. [5]

Taken together, Harper & Row Publishers, Inc. v. Nation Enterprises and Campbell v. Acuff-Rose Music, Inc. establish a straightforward proposition. The fourth factor protects not only existing markets, but also those that are reasonably anticipated. That proposition becomes especially important in the context of AI training, where the most significant harms often arise in markets that are only now taking shape.

Recent litigation brings these doctrinal principles into sharper focus. In UMG Recordings, Inc. v. Anthropic PBC (2023), record labels allege that AI developers copied and ingested copyrighted lyrics and sound recordings without authorization to train large-scale models. [6] The complaint situates this conduct within the existing structure of music licensing and argues that training effectively bypasses the permissions and payments that would ordinarily accompany such uses. A similar theory appears in The New York Times Co. v. OpenAI, Inc. (2023), where the plaintiff challenges the large-scale ingestion of copyrighted works for model training purposes. [7] Although that case arises outside the music industry, its reasoning translates readily. The core claim is that AI training is not a neutral, technical step, but rather a commercially meaningful use of copyrighted material. It is a use that substitutes for licenses that copyright holders are both willing and able to provide, including prospective licensing frameworks tailored specifically to AI training. The development of such frameworks further underscores the existence of a cognizable market under the fourth factor. That substitution becomes even more significant as markets for AI training data continue to develop. 

Seen in this light, AI training looks less like passive data processing and more like a commercially significant use of copyrighted material that implicates copyright holders’ exclusive rights. It depends on the systematic incorporation of expressive works and generates outputs that derive value from that incorporation. Under established fair use doctrine, that combination is precisely what triggers concern under the fourth factor. 

The implications of AI-generated music are especially pronounced in the streaming era, where compensation is tied to aggregate consumption, that is, streaming platforms pool revenue and distribute it based on how frequently each track is played relative to total listening. In Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith (2023), the Supreme Court’s decision emphasized that even a work with transformative qualities must be evaluated in relation to its commercial purpose and market effect. [8] Where the secondary use occupies the same market space as the original, such as licensing for media or distribution, the fourth factor takes on particular significance. That insight maps directly onto the structure of music streaming. AI-generated tracks are not distributed in a separate channel. They appear on the same platforms, compete for the same listeners, and are monetized through the same mechanisms as human-created songs. Because streaming services allocate revenue based on total platform-wide plays, an increase in AI-generated content reduces the proportional share available to human artists. The result is not just isolated substitution, but a form of systemic dilution. This kind of harm is not limited to instances of one-to-one copying. It arises from the way the market itself is organized. Under the precedent set by Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith, courts cannot ignore these structural effects simply because the secondary use appears transformative at first glance. 

Defendants in AI training cases often argue that training is “non-expressive” and therefore highly transformative. That argument has some intuitive appeal, but it risks overstating the role of transformation in fair use analysis. Campbell v. Acuff-Rose Music introduced transformative use as a relevant consideration. It did not suggest that transformation could override clear evidence of market harm. More recently, Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith reinforces this point by emphasizing that commercial context and substitution remain central to the inquiry.

Treating AI training as categorically transformative would have significant consequences. It would insulate a wide range of economically consequential conduct from scrutiny under the fourth factor, even where that conduct interferes with markets that copyright law is designed to protect. Such a result would be difficult to reconcile with the doctrine’s underlying purpose. A more careful approach focuses on whether AI training affects existing and potential markets. Courts should ask whether it competes with existing licensing markets, forecloses the development of new ones, or produces outputs that function as substitutes within established systems of distribution and compensation. When those questions are taken seriously, the claim that training is merely abstract or non-expressive becomes much harder to sustain.

AI training on copyrighted music poses a direct challenge to the market-based logic of fair use. By incorporating expressive works without authorization and producing outputs that circulate within the same commercial ecosystem, AI developers risk displacing both existing and emerging licensing markets. Supreme Court precedent, from Harper & Row Publishers v. Nation Enterprises, Inc. to Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith, makes clear that these kinds of economic effects lie at the heart of the fourth factor. If the doctrine is to remain coherent, courts must take those effects seriously. Doing so does not require abandoning fair use. It requires applying it in a way that reflects the realities of the modern music industry rather than an abstract view of technological innovation. 

Edited by Kate Given


[1] 17 U.S.C. § 107(4). 

[2] Harper & Row, Publishers, Inc. v. Nation Enterprises, 471 U.S. 539 (1985).

[3] Harper & Row, 471 U.S. at 568.

[4] Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569 (1994).

[5] Campbell, 510 U.S. at 590.

[6] Complaint, UMG Recordings, Inc. v. Anthropic PBC (S.D.N.Y. 2023).

[7] Complaint, N.Y. Times Co. v. OpenAI, Inc. (S.D.N.Y. Dec. 2023).

[8] Andy Warhol Found. for the Visual Arts, Inc. v. Goldsmith, 598 U.S. ___ (2023).