Rising Copyright Lawsuits Against OpenAI: Demanding Increased Access to Data for AI Training

Home AI News Rising Copyright Lawsuits Against OpenAI: Demanding Increased Access to Data for AI Training

On July 1, OpenAI announced it utilizes publicly available data, including internet-sourced books and articles, to train ChatGPT. In light of this, content creators and organizations are now seeking compensation for their work. This training data is crucial for the development of leading AI models, with major companies like Google, Meta, OpenAI, Anthropic, and Microsoft competing to secure new data sources. Notably, Meta is even contemplating the acquisition of Simon & Schuster, a prominent publisher.

Publishers have accused these tech giants of unlawfully using copyrighted material, demanding fair compensation for their intellectual property. In submissions to the U.S. Copyright Office, both Meta and OpenAI claim that making copyrighted content publicly available online falls under fair use. However, they are preparing to contest this assertion in court amid ongoing lawsuits related to copyright infringement.

One significant lawsuit has been brought against OpenAI and Microsoft by the Center for Investigative Reporting (CIR), which recently merged with Mother Jones and Reveal. CIR alleges that OpenAI and Microsoft exploited its copyrighted work without consent or compensation. Monika Bauerlein, CIR's CEO, stated, "OpenAI and Microsoft took our news to enhance their products without ever seeking our permission, unlike other organizations that obtain permission for our materials. This free-riding is not only unfair but also infringes on our copyright."

The lawsuit claims OpenAI's WebText training set includes over 16,000 URLs from the Mother Jones domain. Additionally, in a separate class-action lawsuit from the Authors Guild, two authors assert that OpenAI used excerpts from their books to train ChatGPT. The New York Times has also initiated legal proceedings against OpenAI.

Documents from May 2023 revealed that OpenAI deleted two major datasets used to train GPT-3, which may have contained over 100,000 published books. Reports indicated that two employees who managed this data are no longer with OpenAI.

In response to these legal challenges, OpenAI has begun establishing licensing agreements with various news organizations, including the Associated Press, The Wall Street Journal, The New York Post, The Atlantic, Prisa Media, El Mundo, the Financial Times, and Axel Springer, the parent company of Business Insider. However, the volume of content needed for ongoing machine learning significantly exceeds what these agreements can cover.

To combat data shortages, OpenAI is exploring synthetic data, which is generated artificially rather than sourced from real-world materials. While considering synthetic data as a potential solution, CEO Sam Altman has expressed concerns regarding its quality. At a tech conference in May 2023, Altman noted that ensuring the model can create quality synthetic data is essential for success. OpenAI is also investigating collaborative frameworks where one AI system generates data while another evaluates its quality.

As of now, OpenAI has not issued any comments regarding these developments.

Does AI-Powered PCs Signal the Rise of 'Personal Intelligence Computing'?

Is the AI Music Company Sued by Sony and Warner Being Wronged?

Most people like

Ebook Maker

173.9K

Effortlessly generate ebooks with the power of AI. With just a single click, you can transform your ideas into a professional-quality ebook in no time.

ebook creation AI Book Writing

Level AI

24.8K

Discover valuable insights, enhance performance, and streamline operations with automation.

contact center intelligence AI Customer Service Assistant

WarpVideo AI

45.1K

Transform your video content with our cutting-edge AI video creation platform, designed for captivating and engaging visuals. Harness the power of artificial intelligence to produce high-quality videos that resonate with your audience and elevate your brand. Whether you’re a marketer, educator, or content creator, our platform streamlines the video-making process, ensuring impactful storytelling at your fingertips.

AI video creation AI Photo & Image Generator

Audimee

255K

Unlock the power of your voice with our advanced voice-to-voice tool, designed to elevate your vocal performances. Whether you're a musician, podcaster, or content creator, this transformative technology allows you to modify and enhance your vocals effortlessly. Say goodbye to limitations and hello to endless possibilities for your sound. Discover how our tool can help you achieve professional-quality results that captivate your audience!

Voice transformation tool Voice & Audio Editing

Find AI tools in YBX