Stolen Stories or Fair Use? The New York Times v. OpenAI and the Limits of Machine Learning

The legal dispute between The New York Times and OpenAI (along with its primary investor, Microsoft) revolves around copyright infringement claims related to OpenAI’s use of The Times’ works. In December 2023, The New York Times filed a lawsuit against OpenAI and Microsoft in the United States District Court for the Southern District of New York. The lawsuit claims that OpenAI unlawfully used Times content to train its AI models, particularly ChatGPT, and that this use violated copyright law. OpenAI’s use of The New York Times articles for training a large language model is permissible under the doctrine of fair use because the use was transformative. OpenAI’s use of The New York Times articles served a different purpose than the original. Therefore, the court should rule in favor of OpenAI and Microsoft.

The New York Times’ main allegations included OpenAI and Microsoft copying millions of Times articles without permission to train large language models. Thus, the AI-generated outputs sometimes produce near-original copies of Times articles, which The New York Times argues is a significant threat to high-quality journalism. The lawsuit claims that OpenAI models can generate expressive content of Times articles at no cost, making readers less likely to visit their website. The Large Language Models (LLMs) also allows users to bypass The New York Times’ paywall, which could result in financial losses for The New York Times Co. They also believe OpeanAI and Microsoft used their articles to build tools that compete with the Times, without asking or paying for it. Although that was stated, there is insufficient evidence showing a measurable loss in revenue caused by OpenAI’s LLM. Lastly, The New York Times also said that the AI model sometimes makes up stories and wrongly says they came from the Times, which can hurt the Times’ reputation. The New York Times said that if OpenAI and Microsoft keep using their work without permission, it will hurt their ability to fund quality journalism. This could lead to fewer reporters, less original news, and a weaker public understanding of important issues.

On the other hand, OpenAI and Microsoft deny infringement allegations by The New York Times. They believe the use was not breaking copyright laws because the AI models don't copy articles directly in normal use. OpenAI argues that using publicly available articles to train AI models is allowed under fair use laws since the content is accessible to everyone. Fair use includes the use of copyright material in ways that change its purpose, like summarizing or analyzing it, and it may be legal. Both companies claim their AI is not just copying but learning patterns to create new critique content, which they claim is different from simple copying. OpenAI has also argued that their use of copyrighted content for training purposes is non-commercial and focused on research and development rather than profit-making. OpenAI admits that sometimes their AI accidentally repeats parts of articles, but they call this a rare mistake that they are working to fix. Also, they claimed The New York Times may have intentionally tricked the AI into copying their content by using specific prompts, making the issue seem worse than it is. OpenAI has revealed that there had been discussions with The New York Times about how their AI models use news articles. OpenAI has expressed its desire to collaborate, suggesting that they are open to licensing agreements and other solutions to resolve copyright issues. They were trying to find a solution together but were surprised when the lawsuit was filed instead of continuing the conversation.

To understand this case better, the results of a similar case can be taken into account: TREMBLAY, et al. v. Open AI (February 12, 2024). This case is being heard in the U.S. District Court for the Northern District of California. Comedian and author Sarah Silverman, along with other authors Christopher Golden and Richard Kadrey, filed a lawsuit against OpenAI. Stating that the company had infringed upon their copyrights by using their works without consent and compensation to train AI generative models, such as ChatGPT. The plaintiffs argued that OpenAI’s models are derivative works, as they include expressive information taken from copyrighted materials. OpenAI argued that using the authors’ works was allowed under fair use laws. They did not need permission to use the content because it was used for things like research and not for profit. OpenAI also said their AI models didn’t copy the authors’ work in a way that broke copyright laws. The court agreed to let some of the plaintiffs’ claims move forward, meaning they could keep arguing some parts of the case. But the court dismissed other claims, especially those related to the DMCA, which is a law about protecting copyrights and how content is used online. The court decided that the plaintiffs didn’t prove that OpenAI’s actions caused harm or loss, so those parts of the case were thrown out due to lack of proof. This case further emphasizes the growing legal debate around whether AI-generated content can be classified as derivative works. The results of Tremblay v. OpenAI shows that the court is balancing copyright protection with the development of AI. By allowing OpenAI’s fair use defense, the court indicates that AI companies may be allowed to use copyrighted material for training their models in specific situations, depending on factors like research or educational purposes. However, dismissing the DMCA claims shows that plaintiffs need stronger evidence to succeed in similar cases. The court allowing the “unfair business practice” claims to continue suggests that AI companies could still face legal challenges if their actions negatively impact these creators. Finally, this case shows how important it is to have clearer laws and guidelines for how AI should use copyrighted material, especially as AI technology keeps improving and becoming more common. The current laws may not fully address the new challenges that AI presents, so clearer rules are needed to guide how AI companies operate.

Similarly, in The New York Times v. OpenAI, the Times argues that OpenAI’s use of its copyrighted articles for training AI models is harmful to its business and journalism, affecting its ability to generate revenue and produce original content. If the court rules in favor of the Times, it could have significant implications for how AI companies can now operate, potentially requiring them to seek consent and pay for the use of their works. However, just like in Tremblay v. OpenAI, the case highlights the ongoing struggle to balance copyright protection with the potential benefit of AI technology. As OpenAI argues, the technology can be used to improve access to information and contribute to innovation. Given that AI is evolving rapidly, it’s clear that clearer rules are needed to address these new challenges fairly and effectively and protect both creators and technology developers. As of now, there is no definite line to separate fair use from copyright infringement.

In conclusion, The New York Times’ lawsuit against OpenAI highlights the conflict between copyright protection and AI development. Given the lack of evidence showing significant harm and the transformative nature of AI, OpenAI’s fair use defense should prevail in this case.

Nicole ChenJune 9, 2025