Weeks after launching its Sound Effects text-to-sound AI, AI voice startup ElevenLabs is debuting an open-source tool designed to showcase its capabilities. In just 15 seconds, this application allows creators to generate sound effect samples for their videos by analyzing the imported clip and offering multiple sound options.
Developers can access the app's code on GitHub, while a dedicated website lets the public experiment with the Sound Effects API.
When a video is uploaded, the Video to Sound Effects app extracts four frames at one-second intervals on the client side. These frames, along with a prompt, are sent to OpenAI’s GPT-4 to create a customized text-to-sound effects prompt. This is then utilized to generate sound effects through ElevenLabs’s Sound Effects API. Finally, the video and audio are combined on the client side into a single downloadable file, lasting up to 22 seconds.
“We see this as a proof of concept for what users can achieve with our SFX API,” says Ammaar Reshi, ElevenLabs’ design lead. “AI video creators often seek the perfect sound effect, and we aim to streamline that process by analyzing video frames and suggesting optimal outputs.” He emphasizes the potential for dynamic experiences, particularly in immersive video games, where sound effects can evolve based on player interactions.
The API enables developers to create tailored AI sound effects using brief descriptions. ElevenLabs charges based on usage, either 100 characters per generation with automatic duration or 25 characters per second for a set duration.
In a quick test, the video-to-sound effects app was straightforward to use. After importing a silent clip of a vehicle in an all-terrain environment, ElevenLabs’ AI generated four sound options, all resembling a car navigating a gravel road. While adding sound effects to clips can be entertaining, the true potential lies in integrating this capability into broader systems for greater impact.
As the AI video generation landscape evolves, ElevenLabs aims to remain at the forefront by innovating audio solutions that meet the needs of developers, filmmakers, and content creators.