Further validating the vulnerability of generative AI models and their platforms, Lasso Security helped Hugging Face avoid a potentially catastrophic attack by uncovering 1,681 at-risk API tokens. These tokens were detected during a comprehensive scan of GitHub and Hugging Face repositories conducted by Lasso researchers.
The investigation revealed unauthorized access to accounts of 723 organizations, including major firms like Meta, Microsoft, and Google. Of these, 655 users were found to have tokens with write permissions, with 77 granting complete control over repositories of several notable companies. Lasso researchers also accessed Bloom, Llama 2, and Pythia repositories, indicating a significant risk of supply chain attacks potentially affecting millions of users.
“Notably, our investigation revealed a serious breach in the supply chain infrastructure, uncovering high-profile accounts of Meta,” Lasso researchers stated. “The gravity of the situation cannot be overstated. With control over an organization boasting millions of downloads, we can manipulate existing models, turning them into malicious entities. This poses a dire threat, as the injection of corrupted models could impact millions who rely on these foundations for their applications.”
Hugging Face: A Prime Target
Hugging Face has become vital for organizations developing large language models (LLMs), with over 50,000 dependent on its platform in their DevOps efforts. Its Transformers library hosts more than 500,000 AI models and 250,000 datasets, making it the go-to resource for LLM developers and DevOps teams.
The platform's rapid growth is largely attributed to the open-source nature of its Transformers library. Collaboration and knowledge-sharing within this ecosystem accelerate LLM development, increasing the likelihood of successful deployments. This makes Hugging Face an attractive target for attackers, who seek to exploit LLM and generative AI supply chain vulnerabilities or exfiltrate training data.
Lasso Security's Insights
In November 2023, Lasso researchers explored Hugging Face's API token security, aiming to understand potential exposure risks better. They identified three emerging risks aligned with the OWASP Top 10 for Large Language Models (LLMs):
1. Supply Chain Vulnerabilities: Research highlighted how insecure components could compromise LLM lifecycles, exposing the system to attacks, particularly through third-party datasets and pre-trained models.
2. Training Data Poisoning: Attackers could poison LLM training data using compromised API tokens, introducing vulnerabilities or ethical concerns that could undermine model security.
3. Model Theft: Compromised API tokens rapidly enable unauthorized access, facilitating the copying or exfiltration of proprietary LLM models. Lasso's exploration indicated the potential "theft" of over 10,000 private models linked to 2,500 datasets, justifying a rebranding of the OWASP category from “Model Theft” to “AI Resource Theft (Models & Datasets).”
“The gravity of the situation cannot be overstated,” the Lasso Security team reiterated. “With control over organizations with millions of downloads, we can manipulate models, posing significant risks to users."
Conclusion: Treat API Tokens as Identities
The risk of a significant breach at Hugging Face underscores the complex and evolving practices required to safeguard LLM and generative AI platforms. Bar Lanyado, a security researcher at Lasso Security, advised: “Hugging Face should consistently scan for exposed API tokens and either revoke them or notify affected users.”
Drawing on GitHub's approach, he encourages developers to avoid hard-coded tokens and adopt best practices to prevent unintentional exposure during commits. Emphasizing a zero-trust model, Hugging Face should ensure that API tokens are unique, use multi-factor authentication, and focus on lifecycle management and automated identity validation.
In today's zero-trust environment, greater vigilance alone isn't sufficient. The ongoing management of API tokens is crucial for the security of LLM ecosystems nurtured by many leading tech companies. As the incident with Hugging Face illustrates, implementing posture management and maintaining stringent access control at the API token level are essential steps in fortifying overall organizational security. Every organization must adopt a proactive mindset to safeguard against potential breaches and reinforce security across all attack vectors.