Reddit's New Changes Aim to Protect the Platform from AI Crawlers

On Tuesday, Reddit announced an important update to its Robots Exclusion Protocol (robots.txt file), a critical component that informs automated web bots about their permissions to crawl the site. Traditionally, this file has guided search engines in accessing content to help users find information. However, the growing influence of AI has prompted concerns about websites being scraped without proper attribution, and the information being used to train AI models.

As part of these updates, Reddit will implement rate-limiting and block unverified bots and crawlers. The company confirmed that any bot or crawler that fails to comply with Reddit’s Public Content Policy or lacks a formal agreement will be subject to these restrictions.

Reddit emphasizes that this update shouldn’t impact most users or legitimate entities, such as researchers or organizations like the Internet Archive. Instead, it is aimed at preventing AI companies from utilizing Reddit data for training their large language models. However, it's worth noting that AI crawlers might still disregard Reddit's robots.txt file.

This announcement follows a recent Wired investigation revealing that AI search startup Perplexity has been illicitly scraping content. The report highlighted Perplexity’s refusal to honor robots.txt requests despite being blocked by the file, with CEO Aravind Srinivas claiming that the file does not constitute a legal standard.

Importantly, Reddit's forthcoming changes will not affect companies that maintain agreements with the platform. For instance, Reddit has a notable $60 million partnership with Google, allowing the tech giant to leverage Reddit content for its AI initiatives. Through these changes, Reddit is clearly signaling to other organizations wishing to train AI models with its data that a fee will be required.

“Anyone accessing Reddit content must adhere to our policies, which are designed to protect our users,” Reddit stated in a blog post. “We exercise careful selection regarding who we collaborate with for large-scale access to our content.”

This announcement aligns with Reddit's recent policy update aimed at clarifying how commercial entities and partners can access and utilize Reddit's data.

Most people like

Find AI tools in YBX