"Apple, NVIDIA, and Anthropic Allegedly Used YouTube Transcripts Without Consent for AI Model Training"

Home AI News "Apple, NVIDIA, and Anthropic Allegedly Used YouTube Transcripts Without Consent for AI Model Training"

An investigation by Proof News has revealed that some of the largest tech companies, including Apple, NVIDIA, and Anthropic, trained their AI models using a dataset that includes transcripts from over 173,000 YouTube videos—without obtaining permission from the creators. This dataset, compiled by the nonprofit EleutherAI, features transcripts from channels representing more than 48,000 creators, including prominent figures like Marques Brownlee and MrBeast, as well as major news organizations such as The New York Times, BBC, and ABC News.

This investigation highlights a troubling reality in AI development: much of the technology relies on data extracted from creators without their consent or compensation. While the dataset does not contain videos or images, it nonetheless incorporates substantial contributions from influential content creators.

Marques Brownlee expressed concerns on social media, pointing out that Apple sourced data from various companies, one of which scraped transcripts from YouTube videos, including his. He stated, “This is going to be an evolving problem for a long time,” acknowledging the complex ethical landscape surrounding data usage in AI.

A spokesperson for Google reiterated that statements from YouTube CEO Neal Mohan about the violation of the platform's terms of service by companies leveraging YouTube data for AI training still stand. Repeated attempts to obtain comments from Apple, NVIDIA, Anthropic, and EleutherAI have gone unanswered.

Transparency regarding the training data used by AI companies remains an elusive issue. Recently, Apple faced criticism from artists and photographers for not disclosing the source of the training data for its upcoming generative AI feature, Apple Intelligence. In response, Apple clarified that its OpenELM model—created strictly for research—does not power its AI or machine learning capabilities. The company has claimed that its AI models are trained on "licensed data" and publicly available information collected by web crawlers.

YouTube, as the world’s largest video repository, provides an abundance of transcripts, audio, video, and images, making it an appealing resource for developing AI models. Earlier this year, OpenAI’s Chief Technology Officer, Mira Murati, avoided questions regarding whether YouTube videos were used to train Sora, OpenAI’s upcoming AI video generation tool, stating that the data was either publicly available or licensed.

For those interested in determining whether subtitles from your YouTube videos or those of your favorite channels are included in this dataset, visit Proof News' lookup tool.

"Tinder's New AI Will Choose Your Best Photos to Attract Matches"

OpenAI Developing Advanced AI Models for Enhanced Reasoning and In-Depth Research Capabilities

Most people like

EcoLink

5.9K

Introducing an innovative AI and blockchain sustainability platform designed to revolutionize how we approach environmental stewardship. By harnessing the power of artificial intelligence and blockchain technology, this platform aims to enhance transparency, increase efficiency, and promote sustainable practices across various industries. Join us in transforming the future of sustainability through cutting-edge technology.

Sustainability rewards AI Analytics Assistant

AI Flashcard Maker | NoteKnight

7.4K

Transform your studying experience with our advanced flashcard maker, equipped with cutting-edge AI study tools. Create, customize, and optimize your study sessions effortlessly while leveraging intelligent algorithms to enhance retention and understanding. Embrace a smarter way to learn and master new material with our innovative platform.

Ai Flashcards Other

Overtune

6.9K

Overtune is an intuitive platform designed for effortless music creation, allowing users to produce high-quality tracks in no time.

Music creation AI Singing Generator

XLeads

8.1K

Comprehensive wholesaling software designed specifically for real estate professionals.

wholesaling software AI CRM Assistant

Find AI tools in YBX