After perfecting machine learning (ML) voice cloning and synthesis, ElevenLabs, a two-year-old AI startup founded by former Google and Palantir employees, is expanding its offerings with a new text-to-sound model.
Announced recently, this innovative AI will enable creators to generate sound effects simply by describing their vision in words, enhancing content in the evolving landscape of AI-driven digital experiences.
Although the model isn’t publicly available yet, ElevenLabs has released a teaser demonstrating its capabilities using videos created by OpenAI's Sora, enhanced with the company’s AI-generated sounds. They’ve also launched a signup page for an early access waitlist.
Expanding Audio Possibilities with AI Sound Effects
Founded in 2022, ElevenLabs has been dedicated to making audio and video content more accessible across languages and regions. The company offers a variety of tools, including text-to-speech and speech-to-speech models, capable of producing AI-generated speech from various content sources (text, audio, or video) in 29 languages, all while maintaining natural voice and emotional delivery.
These tools are gaining traction among enterprises and individual content creators. In parallel, entirely AI-generated content is on the rise, facilitated by tools like Runway and Pika, alongside OpenAI's Sora. While these products can create realistic videos from simple text prompts, they often lack accompanying audio. ElevenLabs’ new model aims to fill this gap, allowing users to produce sound effects for their content based on textual descriptions.
With this offering, AI creators can seamlessly enhance their projects with background sounds, from bird chirps to bustling street noise.
“At ElevenLabs, we have primarily showcased our text-to-speech models publicly, but we have much more in development. When OpenAI unveiled Sora, which generates impressive videos without sound, we decided to provide a sneak peek of our upcoming product line,” stated Luke Harries, head of growth at ElevenLabs, while sharing a post featuring Sora-generated videos enriched with ElevenLabs' AI sound effects.
The sounds generated by this new model could also be applied to spoken content from text or any video project requiring background audio, such as Instagram clips, commercials, or video game trailers. The quality and versatility of these sound effects remain to be determined.
Sign Up for Early Access
While ElevenLabs has not announced a public launch date, they are now accepting registrations for early access. Interested individuals can visit their signup page, providing their name and email while also describing their intended use for the sound effects. Early volunteers are encouraged to write a sample prompt for an AI sound effect to help optimize the model's responses.
Once registered, users will join a waitlist and gain access when the model is available, although the timeline is currently unclear.
Although ElevenLabs may hold a first-mover advantage with this technology, other companies in the AI speech sector, such as MURF.AI, Play.ht, and WellSaid Labs, also have the potential to develop similar products.
According to Market US, the global market for AI audio tools was valued at $1.2 billion in 2022 and is projected to reach nearly $5 billion by 2032, with a compound annual growth rate (CAGR) of over 15.40%.