After successfully launching tools for text-to-speech and speech-to-speech synthesis, AI voice startup ElevenLabs is setting its sights on a new frontier. Founded by former Google and Palantir employees, the two-year-old startup today introduced its latest innovation: Sound Effects, a text-to-sound AI tool.
Available now on the ElevenLabs website, Sound Effects harnesses the company’s proprietary foundation model, enabling creators to generate diverse audio samples simply by typing a description of the desired sound.
Initially teased in February with Sora-generated clips enhanced by AI sound effects, Sound Effects represents a significant advancement for content creators seeking immersive audio experiences.
What Can Creators Expect from Sound Effects?
Traditionally, adding ambient noises to content—such as social videos, games, movies, and TV shows—required creators to either record sounds manually or purchase audio files from various online repositories. This approach can be limiting, resulting in a lack of available sounds and potential budget constraints.
ElevenLabs' Sound Effects simplifies this process. Users can effortlessly describe the sound they're envisioning in plain, conversational language. The underlying model processes the prompt and generates six unique audio samples for users to choose from. They can listen to each option and seamlessly download or save the preferred samples directly from the ElevenLabs platform.
In early testing, a media outlet observed that Sound Effects produced clear outputs within 30-40 seconds, although only four options were generated instead of six. These samples included a variety of ambient sounds—from standard noises like thunderstorms and doorbells to more complex effects such as monkeys chattering and trains arriving.
Mati Staniszewski, CEO of ElevenLabs, noted that the tool is capable of longer audio samples, including instrumental music and character voices. “Sound Effects can generate instrumental tracks up to 22 seconds with prompts like 'guitar loop' or 'jazz saxophone solo,'” he explained. Users can also create character voices with prompts such as “a woman singing while dancing in the sand” or “an ogre saying, ‘stay away, puny human.’” Additionally, users can chain sounds together with prompts like, “A joyful elderly woman says I’m so proud of you, then laughs.”
While specific details about the underlying model have not been disclosed, ElevenLabs emphasized that it was developed through in-house research and fine-tuned using Shutterstock’s extensive library of licensed audio tracks. Aimee Egan, Chief Enterprise Officer at Shutterstock, expressed excitement about the collaboration, stating, "The synergy between our rich library and this innovative audio technology has resulted in a true market first."
Aiming to Empower Creators Globally
Since its launch, ElevenLabs has been dedicated to creating advanced AI audio solutions. The company began with text-to-speech models in multiple languages, followed by notable products like voice cloning and AI Dubbing, which translates audio and video into 29 languages while retaining the original speaker's voice.
With Sound Effects, ElevenLabs is expanding its offerings, providing creators—including filmmakers, game developers, marketers, and social media influencers—with more powerful tools to elevate their content.
Although Staniszewski did not disclose specific companies currently alpha-testing the product, he mentioned that ElevenLabs serves 41% of the Fortune 500, with notable clients like The Washington Post, Storytel, and TheSoul Publishing.
Looking ahead, the company plans to introduce a music generation model and a voiceover studio offering, both currently in alpha testing, though timelines remain uncertain.
The AI speech, sound, and music generation market is burgeoning, with competitors such as Google, Meta, Suno, Pika, MURF.AI, Play.ht, and WellSaid Labs. As reported by Market US, the global market for these tools reached $1.2 billion in 2022 and is projected to grow to nearly $5 billion by 2032, with a compound annual growth rate (CAGR) of over 15.40%.