AI Companies Bypass Web Standards to Scrape News Publishers' Content for Generative AI Training

Home AI News AI Companies Bypass Web Standards to Scrape News Publishers' Content for Generative AI Training

Updated on November 14 2024

On June 24, Reuters reported that the content licensing startup TollBit has issued a warning to news publishers regarding several AI companies accused of circumventing standard web protocols to scrape content. These companies allegedly use the scraped material to train their generative AI systems. This development comes amid a public dispute between AI search startup Perplexity and media outlet Forbes, centered on the adherence to web standards.

A broader debate is emerging between technology and media firms about the value of content in the generative AI landscape. TollBit aims to serve as a mediator between AI companies seeking content and publishers willing to establish licensing agreements. Forbes has accused Perplexity of plagiarizing its reporting in AI-generated summaries, lacking proper attribution or consent.

Additionally, an investigative report by Wired highlighted that Perplexity might be bypassing the Robots Exclusion Protocol and other protective measures implemented by publishers. The News Media Alliance, representing over 2,000 U.S. publishers, has raised concerns over AI companies neglecting these "no scraping" regulations. Danielle Coffey, the organization’s president, stated, "If AI companies cannot halt large-scale scraping, we won’t be able to monetize valuable content or compensate journalists."

TollBit's findings reveal that Perplexity is not the only platform flouting publishers' "no scraping" policies. Their analysis suggests many AI services are sidestepping these rules, even as some publishers have designated "whitelist" areas for allowable scraping. TollBit remarked, "The more publisher logs we analyze, the more frequently this pattern emerges, indicating that AI platforms are retrieving content despite the robots.txt guidelines."

Prominent publishers like The New York Times have filed lawsuits against AI companies for copyright violations. Conversely, some publishers have chosen to sign licensing agreements with AI firms willing to pay for content, though disputes over the value of provided materials often arise. Many AI developers argue that acquiring content for free does not breach any laws.

This ongoing issue underscores the complex relationship between AI technology and traditional media, highlighting the urgent need for clear guidelines and compensation frameworks.

Dark Side of the Moon Responds: No Plans for Developing or Releasing Overseas Products

人工智能+出现后在各行业带来的新变化

Most people like

Move AI

6.9K

Are you overwhelmed by the thought of your upcoming move? Our AI-powered moving assistant is designed to simplify the relocation process, making it easier and more efficient. From organizing your moving tasks to finding the best services tailored to your needs, our intelligent platform offers personalized support every step of the way. Say goodbye to the chaos of moving and embrace a smoother, more enjoyable experience with our innovative AI technology. Let us help you turn your move into a seamless transition.

AI moving assistant AI Customer Service Assistant

Fluximg AI Image Generator

22.8K

AI Image Generator: Effortlessly Create Stunning, High-Quality Images from Text Prompts.

AI image generator AI Photo & Image Generator

WarpVideo AI

45.1K

Transform your video content with our cutting-edge AI video creation platform, designed for captivating and engaging visuals. Harness the power of artificial intelligence to produce high-quality videos that resonate with your audience and elevate your brand. Whether you’re a marketer, educator, or content creator, our platform streamlines the video-making process, ensuring impactful storytelling at your fingertips.

AI video creation AI Photo & Image Generator

OpenL - Amazing Translator, powered by AI

1.2M

OpenL is an advanced AI-driven translation tool designed for effortless cross-language text translation. Experience enhanced communication with our innovative solution, which breaks down language barriers and connects people globally.

AI Translate

Find AI tools in YBX