Sarah Silverman Sues OpenAI Over Copyright Infringement
Comedian and author Sarah Silverman, along with novelists Christopher Golden and Richard Kadrey, has initiated a lawsuit against OpenAI and Meta. The trio alleges that these companies trained their large language models on copyrighted materials, including their own published works, without obtaining proper consent.
The lawsuits focus on the data sets used by OpenAI and Meta to train their models, specifically ChatGPT and LLaMA. The plaintiffs claim that while OpenAI's "Books1" dataset resembles the size of Project Gutenberg— a reputable repository of copyright-free books— the "Books2" dataset appears to be sourced from "shadow libraries" containing illegally accessible copyrighted material, such as Library Genesis and Sci-Hub. These shadow libraries provide direct downloads, and often offer bulk torrent packages for those generating large language models.
One key exhibit from Silverman’s lawsuit showcases an interaction between her legal team and ChatGPT. When asked to summarize Silverman’s 2010 memoir, The Bedwetter, the chatbot not only outlined significant portions of the book but also repeated certain passages verbatim.
Silverman, Golden, and Kadrey are not the only authors challenging OpenAI over copyright issues. The company is currently facing multiple legal battles regarding its methods for training ChatGPT. In June, OpenAI received two separate lawsuits, including a substantial class action complaint that accuses the company of violating federal and state privacy laws by scraping data to develop its large language models.