"Proposed Legislation Requires AI Companies to Clearly Disclose Copyrighted Material in Training Data"

Companies developing generative AI models could soon be mandated to disclose the copyrighted materials used in training their systems, thanks to a newly proposed legislation known as the Generative AI Copyright Disclosure Act. This act would require developers, including major players like OpenAI, to submit a detailed notice to the Register of Copyrights before launching any new AI systems. The disclosure must include a comprehensive summary of the copyrighted works utilized in both the initial training and subsequent fine-tuning processes, with an emphasis on providing URLs for any publicly accessible training datasets.

The implications of this legislation extend retroactively, meaning that previously released generative AI models, such as ChatGPT and Claude, would also come under scrutiny. Congressman Adam Schiff of California, a key proponent of the bill, stressed the importance of achieving a balance between fostering innovative technology and protecting creators' rights. “We must balance the immense potential of AI with the crucial need for ethical guidelines and protections,” Schiff remarked. “My Generative AI Copyright Disclosure Act is a pivotal step in this direction. It champions innovation while safeguarding the rights of creators, ensuring they are informed when their work contributes to AI training datasets.”

Under this proposed law, companies that fail to meet the disclosure requirements would face civil penalties starting at $5,000. The Register of Copyrights would be tasked with enforcing these penalties and creating an online database for public access, allowing copyright owners to easily confirm whether their works have been employed in AI training datasets.

This initiative has garnered the backing of numerous media trade organizations and labor unions, including the Recording Industry Association of America, SAG-AFTRA, and the Writers Guild of America. Duncan Crabtree-Ireland, SAG-AFTRA’s national executive director and chief negotiator, emphasized the significance of protecting human creative content, stating, “Everything generated by AI ultimately originates from a human creative source. That’s why human intellectual property must be protected. SAG-AFTRA fully supports the Generative AI Copyright Disclosure Act, as this legislation is an important step in ensuring technology serves people and not the other way around.”

In recent months, the landscape for generative AI developers has become increasingly contentious, with several lawsuits filed against them for allegedly training their systems on copyrighted content without appropriate permissions. OpenAI, for instance, is currently defending itself against claims from the New York Times, asserting that the chatbot produces content derived from its original articles. Additionally, authors, music publishers, and various artists have taken legal action against several AI developers, including Nvidia, Anthropic, and Stability AI, over copyright infringement allegations.

According to research conducted by AI startup Patronus, OpenAI's GPT-4 model has been identified as generating the highest volume of copyrighted content. To mitigate potential legal issues and ensure compliance with copyright laws, AI developers have been actively pursuing partnerships with media companies and social media platforms, aiming to access legitimate datasets for model training. For instance, OpenAI has secured content licenses from Axel Springer and the Associated Press, while Google has recently struck a deal with Reddit. In an earlier statement, OpenAI claimed it would be "impossible" to create state-of-the-art models without access to copyrighted materials, underscoring the ongoing tension between innovation and intellectual property rights in the rapidly evolving world of generative AI.

Most people like

Find AI tools in YBX

Related Articles
Refresh Articles