OpenAI Partners with Reddit to Train AI Models on User-Generated Data

OpenAI has secured a partnership with Reddit to utilize the social news platform’s extensive data for training its AI models.

In a recent blog post on OpenAI’s press relations site, the company announced that this collaboration will enable access to “real-time, structured, and unique content,” such as posts and replies from Reddit. This will allow OpenAI to enhance its tools and models, improving their ability to “understand and showcase” Reddit content. The integration of Reddit data will enrich ChatGPT, OpenAI’s widely-used conversational AI, and both companies will work collaboratively to introduce innovative "AI-powered features" for Reddit users and moderators alike.

Additionally, OpenAI will establish a partnership with Reddit for advertising purposes.

“Reddit aims to leverage OpenAI’s advanced AI models to realize its ambitious vision,” stated OpenAI in the announcement. “By employing LLMs, ML, and AI, Reddit seeks to enhance the user experience for all its members.”

OpenAI has a history of forming licensing agreements with various content providers, covering everything from stock media libraries to news agencies. However, this partnership holds a unique twist: Sam Altman, OpenAI’s CEO, possesses an 8.7% stake in Reddit, making him the third-largest shareholder and a former board member.

To address potential concerns, OpenAI emphasized in its press release that, although Altman remains a shareholder, the partnership was directed by OpenAI’s COO, Brad Lightcap, and greenlit by OpenAI’s independent board of directors. It is noteworthy that while Altman is a member of OpenAI’s board, he recused himself from this specific decision, as confirmed by an OpenAI representative.

As Reddit continues its journey as a publicly traded company, data licensing agreements have become critical to its growth strategy. In its IPO prospectus, Reddit disclosed that it has licensing agreements worth over $200 million with clients, including Google. Following its first earnings report as a public company, Reddit posted a remarkable 450% year-over-year increase in non-ad revenue, primarily driven by these agreements.

After the announcement of the OpenAI deal, Reddit's stock experienced an 11% surge in after-hours trading.

“The irony we face is that as machine-generated content proliferates, there’s a growing demand for authentic content created by real people,” Reddit CEO Steve Huffman stated during the company’s earnings call in March. “With nearly two decades of genuine conversation, we are in a strong position.”

Reddit’s platform, featuring over 1 billion posts and more than 16 billion comments—numbers that continue to grow daily thanks to hundreds of millions of active users—presents a wealth of data that generative AI companies can utilize to develop new content models.

However, Reddit may encounter resistance from users who are uneasy about how their data is being monetized. For instance, Stack Overflow, a Q&A platform for developers, recently entered into a data-sharing agreement with OpenAI. In response, some users deleted their top-rated answers, protesting against the deal. Stack Overflow promptly restored the posts and banned those users for violating its terms of service.

Moreover, Reddit has already expressed its disapproval regarding attempts to enhance user data control. Vana, a blockchain-based startup, is working to create a data DAO (Decentralized Autonomous Organization) that allows Reddit users to collectively manage the use of their pooled data. Reddit banned Vana’s subreddit dedicated to discussing this initiative, accusing the company of “exploiting” its data export policies.

Stay tuned—our AI newsletter is launching soon! Sign up here to receive it in your inbox starting June 5.

Most people like

Find AI tools in YBX