Like many AI startups, music generation companies Udio and Suno appear to have utilized unauthorized scrapes of copyrighted material to train their models. This claim comes from both the companies themselves and their investors, as well as newly filed lawsuits initiated by major music organizations. If these lawsuits proceed to trial, they could expose serious legal vulnerabilities for such AI firms while setting a significant precedent in the industry.
The lawsuits, brought forth by the Recording Industry Association of America (RIAA), place us in an unusual position of supporting a group that has long been viewed as a digital media antagonist. As someone who has received stern communications from them myself, I can attest to their controversial reputation. The situation is straightforward.
The essence of the two lawsuits, which share remarkably similar content, is that Udio and Suno (operating as Uncharted Labs) have indiscriminately plundered a substantial portion of recorded music history to compile datasets for training their music-generating AI. It’s important to clarify that these AIs do not "generate" music in the creative sense; instead, they match user prompts to learned patterns from their training data, resulting in outputs that resemble covers or mashups of existing songs.
The likelihood that Udio and Suno used copyrighted material is high, supported by both legal scrutiny and the comments of company leadership and investors, who have carelessly acknowledged the copyright challenges their work poses. They have openly admitted that to create an effective music generation model, they need to ingest extensive amounts of quality music—a necessary component for developing this type of machine learning model. Furthermore, it was stated unequivocally that they did so without obtaining permission from music labels. Investor Antonio Rodriguez from Matrix Partners recently remarked to Rolling Stone:
"Honestly, if we had deals with labels when this company got started, I probably wouldn’t have invested in it. I think they needed to create this product without constraints."
In conversations with RIAA’s legal team, the companies argue that their media usage is justified under the fair-use doctrine—an assertion that typically applies to unauthorized use. While fair use is a nuanced and complex aspect of copyright law, the companies’ approach seems to extend beyond the typical protections available for, say, a student using a Pearl Jam track in a classroom video about climate change.
To put it plainly, the odds are stacked against Udio and Suno. They may have hoped to emulate OpenAI’s tactics, employing ambiguous language and misdirection to delay their less affluent critics, like authors and journalists. However, such tactics are less effective when incontrovertible evidence is presented. The RIAA claims that it possesses thousands of examples showing that songs in its catalog are being replicated by these AI models. Their accusation is clear: whether it’s the Jackson 5 or Maroon 5, the “generated” songs are simply distorted versions of the originals—something that would be unlikely if the original works had not been included in the training data.
The inherent issues with large language models (LLMs), particularly their tendency to produce nonsensical outputs when generating lengthy content, have likely weakened potential lawsuits from authors against OpenAI. The company could reasonably claim that the snippets its model produces were sourced from reviews or freely available content online. Recently, OpenAI has claimed to have stopped using copyrighted material in its training, which is akin to claiming you only squeezed an orange once and then stopped.
However, it is implausible to argue that a music generator only sampled a few bars of "Great Balls of Fire" and subsequently replicated the entire song verbatim. Any judge or jury would find this assertion laughable.
The current legal actions facing generative AI are just the beginning. This situation is not only a matter of common sense but also holds significant legal implications, as recreating recognizable parts of original works highlights a new avenue for judicial relief. If the RIAA successfully demonstrates that Udio and Suno are causing substantial harm to copyright holders and artists, it could seek an injunction to halt the companies' operations at the beginning of the trial.
Generating musical outputs that mimic recognizable songs? That’s a clear legal violation. While some issues require extensive legal debate, recreating popular tunes on demand is likely to lead to immediate repercussions. In response, the companies argue that their systems are not designed to replicate copyrighted works, an argument that attempts to offload liability onto users under the safe harbor provision of Section 230. This is comparable to how Instagram isn’t held accountable when users employ copyrighted songs in their Reels. However, this defense seems weaker here, given the companies’ admissions of prior copyright infringement.
What might the outcomes of these lawsuits look like? As with all things related to AI, predicting the future is challenging due to a lack of established legal precedents. My expectation is that the companies will be compelled to disclose their training data and methods, which are of significant evidentiary value. Should this evidence confirm the misuse of copyrighted material, they may opt to settle or face a swift judgment against them. It's likely that at least one of the companies will attempt to persist by sourcing legal (or at least legally compliant) music, but given their own standards for training data, the resulting product would likely suffer in quality, leading users to abandon it.
As for the investors, they could face substantial losses, having invested in operations that were probably illegal and certainly unethical from both a legal and moral standpoint, particularly in the eyes of the notoriously litigious RIAA.
These lawsuits could have wide-ranging consequences. If investors learn that a significant portion of their capital could vanish due to the inherent risks in generative media, they will likely adopt a greater level of diligence going forward. Companies may glean insights from the trial or settlement documents about which statements to avoid and how to ensure copyright holders remain uncertain.
While the outcome of this specific lawsuit appears predictable, it serves as a lesson in hubris rather than a template for prosecuting or reaching settlements with other generative AI entities. It’s valuable to have such a lesson now and then, even if the instructor is the RIAA.