Reddit Implements New Content Policy: Public Data Access Now Requires a Contract

On Thursday, Reddit announced the launch of a new policy designed to balance its goals of licensing content to major tech companies, such as Google, while safeguarding user privacy. This newly introduced “Public Content Policy” will complement Reddit’s current privacy and content policies, detailing how third parties can access and utilize Reddit’s data. Additionally, the company revealed the creation of a dedicated subreddit for researchers interested in working with Reddit’s data.

This announcement follows Reddit's debut on the stock market and comes as the company seeks to diversify its revenue streams, which include income from advertisements, API usage by developers, and now data licensing. In its IPO prospectus, Reddit disclosed it had generated $203 million from data licensing agreements, with expectations for that figure to grow.

Historically, Reddit did not restrict access to its data for AI training, but the company shifted its approach last year. Reddit CEO Steve Huffman explained to The New York Times that it was unsustainable for Reddit to provide valuable data to some of the world's largest companies without compensation. This marks a strategic pivot toward monetizing data licensing.

With these initiatives advancing, the new Public Content Policy will enforce stricter controls on accessing Reddit's data without a formal agreement. Reddit emphasizes that it is not introducing new restrictions; instead, it is clarifying and publicizing an existing internal policy.

In a blog post, Reddit expressed concern over the increasing number of commercial entities using unauthorized access or misusing authorized access to collect public data in bulk. “Unfortunately, we see more and more commercial entities using unauthorized access or misusing authorized access to collect public data in bulk, including Reddit public content,” the company wrote. They highlighted that these entities often disregard user rights and privacy, particularly when ignoring legal and safety requests.

Reddit is committed to blocking known bad actors but feels a more robust approach is needed to limit large-scale access to Reddit’s public content, ensuring that access remains available to trusted partners who adhere to their policies. Access for non-commercial entities—such as researchers and good-faith users—will remain open, but those interested in using Reddit’s data for commercial purposes, including AI training, will need to enter into a contractual agreement.

To further clarify, Reddit stated that businesses seeking to leverage Reddit’s data “to power, augment, or enhance your product for any commercial purposes” will require a formal contract.

The policy also aims to protect user-generated content, mandating that partners honor users’ requests to delete their posts. This means if users wish to prevent their personal content from being used for AI training, they can opt out. Furthermore, the policy prohibits partners from using Reddit’s data to identify individuals or target ads based on personal information. There are also strict guidelines against using Reddit content for spamming, harassment, or any activities related to background checks or government surveillance.

Moreover, the policy imposes restrictions on access to adult content and reinforces the commitment that Reddit will never sell users’ personal information. The company will not license non-public content, such as private messages or sensitive account information.

To support researchers who wish to utilize Reddit data for non-commercial endeavors, Reddit has launched a new subreddit, r/reddit4researchers, and is collaborating with OpenMined to foster research partnerships.

Reddit, data licensing, user privacy, AI ethics, research collaboration.

Most people like

Find AI tools in YBX

Related Articles
Refresh Articles