OpenAI's Publisher Partnerships: Potential Challenges for Competitors

OpenAI is currently engaged in a legal dispute with The New York Times regarding the data used to train its AI models. However, the company is actively pursuing partnerships with other publishers, including major news organizations in France and Spain.

On Wednesday, OpenAI announced that it has signed agreements with Le Monde and Prisa Media to provide French and Spanish news content for its ChatGPT chatbot. In a blog post, OpenAI stated that these partnerships will allow ChatGPT users to access real-time news coverage from well-known brands such as El País, Cinco Días, As, and El Huffpost. This collaboration aims to enhance the breadth of training data available to OpenAI while ensuring that users receive relevant news information.

OpenAI explained:

"In the coming months, ChatGPT users will interact with curated news content from these publishers, featuring selected summaries with proper attribution and links to original articles. This enhancement will enable users to delve deeper into the stories or explore related content on the publishers' sites. We are consistently enhancing ChatGPT's functionalities and acknowledging the crucial role of the news industry in providing real-time, authoritative information."

To date, OpenAI has established licensing agreements with several content providers, making it a fitting moment to evaluate its progress:

- Shutterstock (stock media library for images, videos, and music)

- The Associated Press

- Axel Springer (owner of Politico and Business Insider)

- Le Monde

- Prisa Media

While OpenAI has not disclosed the financial specifics of these agreements, estimates suggest that the company may be investing between $1 million and $5 million annually per publisher for access to archives, as reported by The Information in January. Although this figure provides insight into news licensing, details on the Shutterstock partnership remain unclear. Assuming these reports hold true, OpenAI may be spending between $4 million and $20 million annually on news content licenses.

This expenditure might seem negligible for OpenAI, which has over $11 billion in funding and recently surpassed $2 billion in annual revenue (according to the Financial Times). However, as Hunter Walk, a partner at Homebrew and co-founder of Screendoor, noted, this financial commitment could present significant challenges for smaller AI competitors seeking similar licensing arrangements.

Walk commented on his blog:

“If experimentation is hindered by high-stakes licensing deals, we risk stifling innovation. The substantial payments made to data owners create serious barriers to entry for new market entrants. When major firms like Google and OpenAI establish high costs, they effectively deter future competition.”

The existence of such barriers is a contentious topic. Many AI vendors have opted to bypass licensing agreements with intellectual property holders, taking the risk of potential legal repercussions. For instance, the art-generating platform Midjourney allegedly trains its models using Disney movie stills without an official agreement with the company.

A more complex question arises: Should licensing become a standard cost in the AI development landscape? Walk argues against this notion, advocating for regulatory "safe harbor" protections that would shield AI vendors, startups, and researchers from legal liabilities as long as they adhere to transparency and ethical guidelines.

Notably, the U.K. recently attempted to establish similar protections, seeking to exempt text and data mining for AI training from copyright constraints if employed for research purposes. However, these initiatives ultimately failed.

I remain hesitant to fully endorse Walk's "safe harbor" concept, especially given the potential impact of AI on the already fragile news industry. A recent analysis from The Atlantic indicated that if search engines like Google incorporated AI into their systems, they could respond to 75% of user queries without requiring a click-through to the publisher's website.

That said, there may be potential for compromises. Publishers deserve fair compensation for their content, but could there be a scenario where they receive payment while also allowing access to their data for smaller competitors and researchers? Perhaps avenues such as grants or larger venture capital investments could facilitate this balance.

While I don’t claim to possess the definitive solution, particularly as legal definitions of fair use regarding AI are still being contested, it’s crucial to engage in these discussions. Without clear guidelines, we risk creating a landscape where knowledge and innovation remain concentrated among a select few companies with exclusive access to valuable training data.

Most people like

Find AI tools in YBX