After recent agreements with Google and OpenAI, Reddit CEO Steve Huffman is urging Microsoft and others to pay for access to the site's data. Huffman stated, “Without these agreements, we lack control over how our data is displayed and used, leading us to block entities unwilling to adhere to our data usage terms.” He identified Microsoft, Anthropic, and Perplexity as companies that have not engaged in negotiations, describing the process of blocking these firms as “a real pain.”
In recent months, Reddit has intensified its efforts against unauthorized web crawlers. In July, the site updated its robots.txt file to restrict access for crawlers without agreements. Users have noted that Reddit content is only appearing in Google search results—where the platform receives payment for data use—while disappearing from other search engines like Bing.
Huffman accused Microsoft of utilizing Reddit's data to train its AI and to summarize content in Bing results without proper notification. He mentioned that Reddit's data has also been sold via the Bing API to other search engines. Huffman referenced Microsoft AI CEO Mustafa Suleyman's claim that public data online is “freeware,” stating, “We’ve seen Microsoft, Anthropic, and Perplexity treat all internet content as free to use; that’s their genuine stance.”
In response to Reddit’s content being omitted from Bing, Microsoft’s head of search, Jordi Ribas, asserted that “Reddit has blocked Bing from crawling their site for search, favoring another search engine and limiting competition.” A Microsoft representative later affirmed, “We respect the instructions from websites that prefer not to have their content utilized by our generative AI models.”
Huffman highlighted OpenAI's recent launch of SearchGPT, which can display Reddit results due to their earlier deal, as a model he aims to follow. However, spokesperson Tim Rathschmidt noted that Reddit's content licensing agreements do not include exclusive data usage rights.
By advocating for licensing agreements, Reddit aligns itself with more traditional media publishers in seeking compensation for allowing their content to support generative AI development. Huffman commented, “The traditional value exchange from search engines is evolving. The lines between search, summarization, and training are blurring, complicating the value exchange of crawling for traffic.”
In a statement following the publication of this story, Anthropic spokesperson Jennifer Martinez clarified, “Reddit has been on our block list for web crawling since mid-May, and we have not added any Reddit URLs to our crawler since then. We adhere to robots.txt, which is the industry standard for blocking web crawling.” Microsoft declined to comment, and Perplexity did not respond to requests for a statement.