In a groundbreaking shift from traditional practices, generative AI companies are deploying large language models (LLMs) directly into the unpredictable environment of the internet for quality assurance.
Why invest time in thorough testing when the online community can collectively identify bugs and glitches? This bold experiment invites users to partake in an extensive and unplanned beta test. Every prompt reveals the unique quirks of LLMs, while the vast internet serves as a catch-all for errors—so long as users agree to the terms and conditions.
Ethics and Accuracy: Optional?
The rush to unleash generative AI LLM models resembles giving out fireworks—entertaining, but potentially hazardous. For example, Mistral recently introduced its 7B model under Apache 2.0 licenses. However, the lack of explicit usage constraints raises alarming concerns about potential abuses.
Minor tweaks in underlying parameters can lead to drastically different results. Moreover, biases ingrained in algorithms and training datasets perpetuate societal inequalities. CommonCrawl, which supplies the bulk of training data for LLMs—60% for GPT-3 and 67% for LLaMA—operates without stringent quality controls, placing the burden of data selection on developers. It is crucial to recognize and address these biases to ensure ethical AI deployment.
Developing ethical software should be mandatory, not optional. Yet, if developers choose to disregard ethical guidelines, there are limited safeguards. Thus, it is imperative for policymakers and organizations to ensure responsible and unbiased application of generative AI.
Who Holds Responsibility?
The legal landscape surrounding LLMs is murky, often leading to critical questions about accountability. Service terms for generative AI do not guarantee accuracy or accept liability, instead placing reliance on user discretion. Many users engage with these tools to learn or for work, yet they may lack the skills to distinguish reliable information from hallucinated content.
The impact of inaccuracies can ripple into the real world. For example, Alphabet’s stock price dropped sharply after Google’s Bard chatbot falsely stated that the James Webb Space Telescope had captured the first images of a planet outside our solar system.
As LLMs become integrated into significant decision-making applications, the question arises: If errors occur, should the responsibility lie with the LLM provider, the service provider employing LLMs, or the user who failed to verify information?
Consider two scenarios: Scenario A involves a malfunctioning vehicle leading to an accident, whereas Scenario B portrays reckless driving causing the same outcome. The aftermath is unfortunate, yet accountability differs. With LLMs, errors may stem from a blend of provider failure and user negligence, complicating accountability.
The Need for ‘No-LLM-Index’
The existing “noindex” rule allows content creators to opt out of search engine indexing. A similar option, “no-llm-index,” could empower creators to prevent their content from being processed by LLMs. Current LLMs do not comply with the California Consumer Privacy Act (CCPA) or the GDPR’s right to erasure, complicating data deletion requests.
Unlike traditional databases where data is easily identifiable and deletable, LLMs generate outputs based on learned patterns, making it nearly impossible to target specific data for removal.
Navigating the Legal Landscape
In 2015, a U.S. appeals court deemed Google’s scanning of books for Google Books as “fair use,” citing its transformative nature. However, generative AI transcends these boundaries, prompting legal challenges over the compensation of content creators whose work feeds LLMs.
Big players like OpenAI, Microsoft, GitHub, and Meta face litigation related to the reproduction of computer code from open-source software. Content creators on social platforms should have the agency to opt-out of monetizing or allowing their work to be fed into LLMs.
Looking Forward
Quality standards differ significantly across sectors; for instance, the Amazon Prime Music app crashes daily, while even a 2% crash rate in healthcare or public services could be catastrophic. Meanwhile, expectations for LLM performance remain in flux. Unlike app failures that are easily identifiable, determining when AI malfunctions or produces hallucinations is complex.
As generative AI advances, balancing innovation with fundamental rights remains critical for policymakers, technologists, and society. Recent proposals by China’s National Information Security Standardization Technical Committee and an Executive Order from President Biden urge frameworks for managing generative AI issues.
The challenges aren’t new; past experiences illustrate that, despite persistent issues like fake news, platforms often respond minimally. LLMs require expansive datasets often sourced freely from the internet. Although curating these datasets for quality is possible, defining “quality” remains subjective.
The key question is whether LLM providers will truly address these issues or continue to shift responsibility.
Buckle up; it’s going to be a wild ride.
Amit Verma is the head of engineering/AI labs and a founding member at Neuron7.