OpenAI: 'Training AI Models Without Copyrighted Content is 'Impossible'

Home AI News OpenAI: 'Training AI Models Without Copyrighted Content is 'Impossible'

Updated on October 23 2024

OpenAI, the creator of ChatGPT, claims that training advanced AI models like GPT-4 without using copyrighted materials is fundamentally impossible. This assertion comes amid a lawsuit filed by The New York Times, accusing OpenAI and its major investor, Microsoft, of copyright infringement. The newspaper alleges that ChatGPT has been trained on its copyrighted news content and replicates it "near-verbatim."

In its defense, OpenAI emphasizes that limiting training data solely to public domain sources would hinder the development of AI technologies that cater to the demands of contemporary society. The company communicated to the U.K. House of Lords' Communications and Digital Select Committee that “copyright today encompasses virtually every form of human expression” including blog posts, photographs, forum discussions, and even government documents. As such, restricting access to these materials would critically impair AI development.

Additionally, OpenAI argues that its practices conform to existing laws, asserting that copyright law does not explicitly prohibit training AI models, framing it instead as a matter of fair use. The organization also highlighted mechanisms in place to respect copyright, such as allowing websites to prevent access by its web crawler, GPTBot, and providing an opt-out process for content creators who prefer their work not to be included in future training datasets.

OpenAI expressed disappointment upon learning about the lawsuit through The New York Times itself, noting that discussions regarding real-time content display with attribution in ChatGPT were progressing as recently as December 19, prior to the legal action filed on December 27. The startup continues to engage actively with media outlets, aiming to establish mutually beneficial solutions. It has already forged licensing agreements with companies such as Axel Springer, publisher of Politico and Business Insider, as well as the Associated Press, and anticipates securing additional partnerships in the near future.

In its blog post, OpenAI alleges that The New York Times is not presenting the full narrative surrounding the lawsuit. While the Times maintains that ChatGPT reproduces its articles nearly word-for-word, OpenAI acknowledges this occurrence as a “rare bug” that it is actively addressing. The company explained that such memorization is an infrequent failure in the model’s learning process, often arising when specific content appears repeatedly in training data from various public sources.

Moreover, OpenAI pointed out that instances of near-verbatim reproduction can be exacerbated by the Times’ usage of manipulated prompts that include extensive excerpts from its articles. Andrew Ng, founder of Google Brain, supported this viewpoint by noting that prompts employed by the Times do not reflect typical user inquiries and reiterated that the observed word-for-word recreation seems to be a bug.

OpenAI stressed that The New York Times could opt out of having its content used in training, a choice it exercised in August of the previous year. The company remains optimistic about continuing to collaborate with news organizations in order to enhance the delivery and production of high-quality journalism by harnessing the transformative capabilities of AI technology.

Chinese Companies Turn to Nvidia Gaming Chips for AI Amid Desperation

OpenAI Unveils GPT Store: Create Your Own Custom ChatGPTs Today!

Most people like

Homework AI

71.2K

Instant, step-by-step AI-powered homework assistance for students.

AI homework helper Homework Helper

BotsCrew

38.9K

In today's digital landscape, businesses are increasingly turning to smart custom chatbots to enhance customer interactions and streamline operations. By leveraging advanced AI technology, these chatbots can provide personalized support, answer queries in real-time, and significantly improve user experiences. Whether you're looking to boost sales, improve customer service, or automate repetitive tasks, investing in custom chatbot development is a strategic move for any forward-thinking organization. Explore the transformative potential of chatbots and how they can drive growth and engagement for your brand.

chatbot development AI Chatbot

WonsultingAI

583.4K

In today’s competitive job market, finding the right position can be challenging. Enter AI-powered job search tools, which leverage artificial intelligence to streamline and enhance your job-hunting experience. By analyzing your skills, preferences, and career goals, these innovative platforms connect you with tailored job opportunities, saving you time and effort. Embrace the future of recruitment and discover how these intelligent solutions can transform your job search into a more efficient and effective journey.

AI-powered job search Resume Builder

CheatGPT

60K

CheatGPT is an innovative AI study tool designed to provide students with instant answers and valuable exam assistance. Whether you're tackling complex subjects or preparing for tests, CheatGPT offers the support you need to enhance your learning experience effectively.

CheatGPT AI Advertising Assistant

Find AI tools in YBX