Why Derivative Works Are the Poison Pill for Generative AI

Meta's recent launch of Llama 2 has ignited significant interest in open-source large language models (LLMs), marking it as the first commercial-licensed LLM from a major tech company. However, amidst this excitement, we must not overlook the looming uncertainties surrounding legal issues such as intellectual property (IP) rights and copyright in the generative AI landscape. Many are assuming that regulatory risks primarily concern the companies developing LLMs, but this perspective can be dangerously misleading.

The introduction of generative AI brings about a complex challenge: derivatives. While copyright law defines “derivative works,” there is little precedent for laws addressing data derivatives, a situation set to evolve with the rise of open-source LLMs. When software generates output data from input data, determining which output constitutes a derivative remains a muddy issue. Is all generated content derivative? Some? None at all?

This ambiguity poses an upstream problem: uncertainty around legal claims in the evolving landscape of IP and LLMs. The dynamic shifts significantly with the rise of LLMs due to three key factors:

1. Centralization: LLMs revolutionize the ability of a single software piece to produce highly variable outputs applicable across various domains. They create not just text and images, but also code, audio, video, and data. As LLM usage spreads rapidly, the risk of liability extends beyond LLM vendors to their users, encompassing copyright issues as well as potential harms from inaccuracies and biases.

2. Incentives: Copyright holders are driven to advocate for the broadest definitions of LLM derivatives, thereby widening their scope for claiming damages. Major platform companies also benefit from these fuzzy definitions in their competitive strategies, as seen in the Llama 2 license, which restricts improvements to non-Llama LLMs.

3. Risk-shifting: Software companies often transfer risk to their users through limited liability licenses. This trend will continue as major tech firms seek to push regulatory responsibility onto users, similar to how Section 230 protects social media platforms despite their active roles in content amplification.

If courts determine that companies using copyrighted content to train their models infringe copyright laws, enterprises relying on these models must navigate two primary risks:

1. Platform Risk: Will the vendor withdraw the model from the market? If that happens, will a replacement model with similar functionality be available? What will entail the effort and time required to retune models and prompts?

2. Pricing Risk: If the vendor retains the model, will using it become more expensive due to copyright payments, instigating new costs in developing or operating the LLM?

While LLM vendors will argue that models are not infringing despite using copyrighted materials, citing that the models themselves construct distinct data, the real concern hinges on output infringement. For example, AI-generated responses that closely mimic copyrighted materials could put businesses at risk.

Should courts side with this view, enterprises will need to tackle an additional challenge:

Flow-down Risk: How can a business ensure that its utilization of an LLM does not infringe copyright? What is the scope of this risk beyond direct LLM outputs, encompassing derivatives and the value created by individuals and systems leveraging those outputs?

Awareness of the legal landscape surrounding generative AI empowers enterprise technology leaders to manage potential risks effectively.

Our Recommendations:

- When assessing LLM licenses, prioritize clear ownership over LLM outputs and their derivatives, ensuring unrestricted use to enhance other LLMs. In the absence of clear definitions on derivatives, create policies that effectively outline what constitutes a transformative change in LLM outputs (e.g., summarizing outputs versus mere alterations). Such policies will help mitigate flow-down risks.

- For paid licenses, ensure protection against specific risk types and discuss the financial implications should risk transfer from vendor to business in the future. It’s often more economical for large LLM vendors to secure necessary IP rights for their clients or establish specialized insurance plans than for individual users. We've seen this kind of risk management in the cybersecurity sector, where some vendors bundle ransomware insurance. In the generative AI space, Adobe provides full indemnification for content created via Firefly, while Writer extends similar coverage for generated content.

- Don't overlook the regulatory aspect: Inaction from LLM users could lead to significant regulatory safeguards favoring large LLM platforms and Big Tech, potentially at the expense of startups and users. Current pricing for generative AI capabilities in platforms like ChatGPT Plus and Microsoft Office hovers around $25–$30 per user monthly. At these price points, it's crucial that risks do not unfairly shift to end-users.

The software industry once grappled with issues stemming from licensing around “viral” or “copyleft” licenses, exemplified by the GPL. As open source surged alongside the rise of SaaS and cloud computing, many SaaS applications avoided the GPL’s limits by not distributing software. The AGPL license remedied this loophole, often chosen by businesses seeking to maintain control over their value chains.

Conversely, most grassroots open-source projects tend to adopt more permissive licenses like Apache 2.0, BSD, and MIT. Will open-source LLMs provide the necessary solution? They may enable enterprises to sidestep certain restrictions imposed by commercial LLM licenses but will still expose users to copyright risks.

As the LLM market evolves, vendors will similarly diverge. Some will continue with the "push all risk to users" model, while others will distinguish themselves by collaborating with customers on risk management. This could take forms ranging from precise training with verifiable input data rights to services that complicate legal actions against users, akin to private messaging solutions.

Navigating the balance between LLM capabilities and effective risk management is likely to become increasingly intricate as we transition from the chaotic early days of AI to a more structured environment. However, this endeavor is undoubtedly worth pursuing.

Most people like

Find AI tools in YBX