Reddit Reports $203M Revenue from Licensing Its Data So Far

Reddit's trajectory as it approaches a stock market listing is closely tied to its relationships with AI vendors like OpenAI, more than many might realize. In its IPO prospectus submitted today to the U.S. Securities and Exchange Commission, Reddit highlighted the significant benefits it anticipates from its data licensing agreements, which involve over 1 billion posts and more than 16 billion comments.

"In January 2024, we entered into data licensing agreements with a total contract value of $203 million, with terms spanning two to three years," states the prospectus. "We expect to recognize at least $66.4 million in revenue during the year ending December 31, 2024, with the remainder in subsequent periods."

Currently, the specific AI vendors licensing data from Reddit remain undisclosed. However, reports from Bloomberg and Reuters earlier this week indicated that an unnamed large AI company—speculated to be Google—has secured a licensing agreement estimated at around $60 million annually. OpenAI is also a plausible customer, especially given that its CEO, Sam Altman, owns an 8.7% stake in Reddit, making him the third-largest shareholder, and previously served on its board.

What makes Reddit's data so valuable? As Reddit points out, AI models "learn" from extensive examples to generate various content forms, including essays, code, emails, and articles. Vendors like OpenAI scour the web for millions to billions of these examples to enrich their training sets. While some data examples are available in the public domain, many, including content from Reddit, come with restrictive licenses that necessitate citation or compensation.

Previously, Reddit allowed unrestricted access to its data for AI training. However, last year it pivoted, arguing—according to CEO Steve Huffman—that its data should not be “[given] to some of the largest companies in the world for free.”

“Our data APIs provide real-time access to evolving topics such as sports, movies, news, fashion, and current trends,” the prospectus highlights. “We believe Reddit’s vast collection of conversational data will continue to enhance large language models. As our content refreshes daily, models will seek to incorporate these new insights, updating their training with Reddit's data.”

In light of the increasing threat posed by chatbots like OpenAI’s ChatGPT and Google’s Gemini to traffic, content producers—from stock media libraries to news outlets—are increasingly pursuing data licensing agreements with AI vendors. A recent analysis by The Atlantic found that if a search engine, such as Google, were to integrate AI into its service, it could respond to user queries correctly 75% of the time without needing users to click through to the site's content.

Faced with a wave of lawsuits claiming illegal data usage without permission or compensation, AI vendors are motivated to secure licensing agreements. Recently, The New York Times accused OpenAI of creating competitors from its content, negatively impacting its business.

OpenAI has signed contracts with image gallery Shutterstock and publishers like Axel Springer, owner of Politico and Business Insider; however, these licenses are generally modest, reportedly capping at $5 million per year.

Most people like

Find AI tools in YBX