Following the explosive news at the end of last year that The New York Times, one of the world's most iconic newspapers, is suing OpenAI and its partner Microsoft for copyright infringement, OpenAI has publicly responded with a blog post claiming the lawsuit is “without merit.”
“We support journalism, collaborate with news organizations, and believe The New York Times lawsuit is without merit,” the blog post begins.
OpenAI makes three key points:
1. We collaborate with news organizations, creating new opportunities.
2. Our training practices fall under fair use, and we offer an opt-out option because it is the right thing to do.
3. Instances of “regurgitation” from training data are rare, and we are actively working to eliminate them.
These claims are further detailed in the post.
The central issue revolves around OpenAI's content licensing agreements with other media entities, including Axel Springer (publisher of Politico and Business Insider) and the Associated Press, contrasting with its previous stance on scraping public websites for training data, which includes the content used by GPT-3.5 and GPT-4 powering ChatGPT.
Since the DevDay developer conference in November 2023, OpenAI has offered indemnification—providing legal protections to organizations using its AI products.
How did we get here?
The NYT initiated the lawsuit in late December 2023 in the Southern District Court of New York. The newspaper alleges that OpenAI trained its models on copyrighted articles without proper authorization or compensation, citing specific instances where ChatGPT generated text closely resembling NYT articles, which they argue constitutes direct copyright infringement.
The lawsuit followed months of unsuccessful negotiations between OpenAI and NYT representatives over a content licensing agreement.
OpenAI asserts that using publicly available internet materials qualifies as fair use, a view supported by longstanding legal precedents. The company maintains that it has implemented a simple opt-out process for publishers, which The New York Times utilized in August 2023, allowing them to restrict access to their website.
However, critics note that this opt-out mechanism was only introduced after the launch of ChatGPT in November 2022, leaving little opportunity for publishers to protect their data before that date.
OpenAI also accuses NYT of “intentionally manipulating prompts” to demonstrate evidence of article reproduction in violation of OpenAI’s Terms of Service. The company claims that the examples provided by NYT involved manipulated prompts, including lengthy excerpts from articles, prompting responses that were disproportionately similar to NYT content.
Despite these allegations, a spokesperson for Trident DMG, representing NYT, reiterated the newspaper's position. Ian Crosby, lead counsel for The New York Times, stated, “The blog concedes that OpenAI used The Times’s work to build ChatGPT. That’s not fair use by any measure.”
As the case develops, OpenAI and The New York Times will present their arguments before Federal District Court Judge Sidney H. Stein. Though the initial hearing date is not yet available, further legal proceedings will likely add depth to the ongoing debate over AI's use of copyrighted material.
With rising examples of AI services reproducing copyrighted content—including the AI image generator Midjourney, which has faced legal challenges—2024 is poised to be a pivotal year for AI technology and its legal implications concerning training data sources.