New York Times Challenges OpenAI's 'Unprecedented' Request for Journalistic Content

The legal dispute between The New York Times and OpenAI has escalated sharply, spotlighting the contentious issues surrounding copyright and artificial intelligence. After The Times filed a lawsuit against OpenAI in December 2023, claiming that ChatGPT was trained on its articles without authorization, the stakes have risen dramatically. The newspaper alleges that the AI can generate “near-verbatim excerpts” from its content, prompting a fierce response from OpenAI.

As the case unfolds, OpenAI is seeking access to internal documents from The Times, including journalists' notes and memos, arguing these materials are essential for its defense during the pretrial stage of the lawsuit. In contrast, The Times is resisting this request, describing it as “overbroad and unduly burdensome,” claiming that such a demand does not pertain to any relevant aspects of the case.

OpenAI insists that access to these materials is vital for evaluating the validity of The Times' claims. In a memo submitted to U.S. District Judge Sidney H. Stein on July 1, the AI organization contended that providing only the articles themselves would be inadequate for assessing whether these works are indeed original creations deserving of copyright protection. The memo states, “OpenAI cannot determine from the works alone which portions reflect human-authored content original to The Times and which portions do not.”

In response, The Times issued a memorandum on July 3, criticizing OpenAI's request as “unprecedented” and counter to established copyright law. The newspaper emphasizes that its processes for gathering news should not be under scrutiny. According to their legal team, the sole focus should be on whether OpenAI and Microsoft have unlawfully infringed upon the copyright of millions of The Times’ published works.

The Times further argues that copyright law protects even those articles composed primarily of quotes, asserting that the creative nature of a work is evaluated based on the work itself. The company has pushed back against the view that OpenAI's discovery requests are justified, suggesting that the AI firm has not proven that its demands go beyond a mere "fishing expedition." The memo contends, “Only actual judicial determinations of infringement are relevant, not unsubstantiated claims.”

This ongoing legal clash is emblematic of broader tensions around fair use, particularly regarding how copyrighted materials are utilized in training AI models. Allegations suggest that OpenAI improperly scraped content from The Times without appropriate consent, raising critical questions about the ethical and legal frameworks governing data scraping practices in AI development.

Historically, data scraping has been a prevalent approach for gathering information in the AI industry, with developers often invoking fair use principles to justify their actions. However, recent trends indicate a growing resistance from rights holders, who are becoming increasingly assertive about protecting their content. For instance, Reddit recently barred AI web crawlers to safeguard its data, setting a precedent that has resonated with smaller news organizations. Several have recently joined The Times in taking legal action against OpenAI, accusing the company of “theft” of copyrighted content.

OpenAI has consistently refuted The Times’ allegations, framing its own actions as necessary for advancement in AI technology. The company previously argued that it relies on access to copyrighted works to develop state-of-the-art models. In light of its mounting legal challenges, OpenAI has moved to secure licensing agreements with various content providers, including News Corp, Stack Overflow, and Axel Springer, in an effort to legitimize its data sourcing practices.

This high-profile dispute highlights the complexities at the intersection of copyright law and emerging technologies, raising essential questions about how the future landscape of AI development will be shaped by legal precedents and industry standards.

Most people like

Find AI tools in YBX