A recent study from Amazon Web Services (AWS) researchers has unveiled serious security vulnerabilities in large language models (LLMs) capable of understanding and responding to speech. Titled “SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models,” the paper reveals how these AI systems can be manipulated to generate harmful or unethical responses through strategically designed audio attacks.
As speech interfaces become increasingly prevalent—from smart speakers to AI assistants—ensuring their safety and reliability is vital. The research indicates that, despite existing safety measures, speech language models (SLMs) remain highly susceptible to "adversarial attacks." These attacks involve slight alterations to the audio input that are undetectable to humans but can radically change the model’s output.
In a striking illustration, the AWS study outlines how a spoken AI system could be coerced into providing unethical instructions—such as how to rob a bank—when subjected to an adversarial attack. To combat these vulnerabilities, the researchers suggest a pre-processing defense mechanism.
Jailbreaking SLMs with Adversarial Audio
The authors of the study report that their experiments reveal a staggering vulnerability in SLMs, with average success rates of 90% for jailbreaking using adversarial perturbations, and 10% for transfer attacks on a dataset of harmful questions. They warn of serious implications, including the potential for malicious actors to exploit these weaknesses at scale.
Employing projected gradient descent, the researchers generated adversarial examples that consistently prompted SLMs to produce toxic outputs across 12 categories, including explicit violence and hate speech. Remarkably, when they had full access to the model, they achieved a 90% success rate in breaching its safety constraints.
The study underscores the feasibility of adversarial attacks across various spoken question-answering AI models. By utilizing cross-model and cross-prompt strategies, unexpected responses were elicited, highlighting the imperative need for robust and transferable defenses.
Black-box Attacks: A Real-World Threat
Even more concerning, the study found that audio attacks designed for one SLM often transferred successfully to different models, even without direct access—an increasingly common scenario since most commercial providers offer limited API access. Although the attack success rate dropped to 10% in this "black box" context, it still presents a significant vulnerability.
Lead author Raghuveer Peri stated, “The transferability of these attacks across different model architectures suggests a fundamental flaw in our current approach to training these systems for safety and alignment.”
The implications are considerable as businesses increasingly integrate speech AI for functions like customer service and data analysis. Besides the risk of reputational damage from a malfunctioning AI, adversarial attacks could facilitate fraud, espionage, or even physical harm in automated environments.
Countermeasures and the Road Ahead
Fortunately, the researchers propose various countermeasures, such as introducing random noise to audio inputs—termed randomized smoothing. Their experiments demonstrated that this technique significantly reduced the success rate of adversarial attacks, although the authors acknowledge it is not a foolproof solution.
“Defending against adversarial attacks is an ongoing arms race,” Peri remarked. “As the capabilities of these models grow, so does the potential for misuse. Continued investment in enhancing their safety and robustness is crucial.”
The SLMs studied were trained on dialog data, achieving state-of-the-art performance in spoken question-answering tasks with over 80% safety and helpfulness before the attacks were implemented. This highlights the challenge of balancing capability and safety as technology evolves.
With leading technology firms racing to develop more powerful speech AI, this research serves as a timely reminder that security should be prioritized rather than treated as an afterthought. Collaboration between regulators and industry groups will be essential to establish stringent standards and testing protocols.
As co-author Katrin Kirchhoff emphasizes, “We’re at an inflection point with this technology. It holds tremendous potential for societal benefit, but it can also cause harm if not developed responsibly. This study represents a crucial step towards maximizing the advantages of speech AI while minimizing its risks.”