Stability AI Unveils Stable Audio 2.0: Enhanced Clarity and Power in Generative AI Audio Solutions

Home AI News Stability AI Unveils Stable Audio 2.0: Enhanced Clarity and Power in Generative AI Audio Solutions

Stability AI is advancing its vision for generative AI with the launch of the Stable Audio 2.0 model.

While the company is widely recognized for its text-to-image Stable Diffusion models, it’s expanding its portfolio. Stable Audio initially debuted in September 2023, allowing users to create short audio clips based on text prompts. With Stable Audio 2.0, users can now generate high-quality audio tracks of up to three minutes—double the length of the original 90 seconds.

In addition to text-to-audio generation, Stable Audio 2.0 introduces audio-to-audio capabilities, enabling users to upload samples and use them as prompts. The model is currently available for limited free use on the Stable Audio website, with API access coming soon for developers looking to build innovative services.

The release of Stable Audio 2.0 marks Stability AI's first major update since the abrupt resignation of former CEO and founder Emad Mostaque in March. The company reassures users that the update signifies ongoing business operations.

Improvements from Stable Audio 1.0 to 2.0

The development of Stable Audio 2.0 has drawn valuable insights from its predecessor, Stable Audio 1.0. Zach Evans, head of audio research at Stability AI, noted that the focus during the initial release was to launch a groundbreaking model with superior audio fidelity and meaningful output duration.

“Since then, we’ve focused on enhancing musicality, extending output duration, and improving responsiveness to detailed prompts,” Evans stated. “These enhancements aim to make the technology more applicable in real-world scenarios.”

Stable Audio 2.0 can now produce full musical tracks featuring coherent structures. Utilizing latent diffusion technology, the model can generate compositions lasting up to three minutes, complete with distinct intro, development, and outro sections—a significant upgrade from its earlier ability to create only short loops or fragments.

The Technology Behind Stable Audio 2.0

Stable Audio 2.0 continues to leverage a latent diffusion model (LDM). Following the December 2023 beta release of Stable Audio 1.1, the model incorporated a transformer backbone, resulting in a “diffusion transformer” architecture.

“We enhanced the data compression applied to audio during training, allowing us to scale outputs up to three minutes and beyond while maintaining efficient inference times,” Evans added.

Enhanced Creative Capabilities

With Stable Audio 2.0, users can generate audio not only from text prompts but also from uploaded audio samples. Natural language instructions can be used to creatively transform these sounds, enabling iterative refining and editing processes.

The model also broadens the spectrum of sound effects and textures. Users can now prompt it to create immersive surroundings, ambient sounds, crowds, cityscapes, and more. Additionally, it allows modifications to the style and tone of both generated and uploaded audio.

Addressing Copyright Concerns in Generative AI Audio

Copyright considerations remain a significant issue in the generative AI space. Stability AI is committed to upholding intellectual property rights with its new audio model. To alleviate copyright concerns, Stable Audio 2.0 has been exclusively trained on licensed data from AudioSparx, and it respects opt-out requests. Content recognition technology monitors audio uploads to prevent the processing of copyrighted material.

Safeguarding copyright is essential for Stability AI to successfully commercialize Stable Audio and ensure safe usage for organizations. Currently, Stable Audio generates revenue through subscriptions to its web application, with an API set to launch soon.

However, Stable Audio is not an open model at this time. “The weights for Stable Audio 2.0 will not be available for download, but we are developing open audio models for release later this year,” Evans confirmed.

Google Cloud and CSA: C-Suite Leadership Fuels Rapid Generative AI Adoption in Cybersecurity for 2024

AWS Introduces Mistral Large Model to Amazon Bedrock for Enhanced AI Capabilities

Most people like

InVideo AI

10.1M

InVideo is a powerful online video editing platform that offers a diverse range of premium templates, high-quality images, and an extensive music library. Whether you're creating promotional content, social media videos, or personal projects, InVideo provides the tools you need to enhance your videos and engage your audience effectively.

Online Video Editor AI Video Editor

RSIP Vision

15.1K

In today's rapidly evolving healthcare landscape, the intersection of medical image analysis and artificial intelligence (AI) is revolutionizing diagnostics and treatment plans. As a leader in this field, we are committed to enhancing patient outcomes through sophisticated imaging techniques and AI-driven insights. Our pioneering efforts are not just advancing technology but are also redefining the standards of care in medicine, making us a cornerstone in the quest for precision healthcare. Explore how our groundbreaking innovations are shaping the future of medical imaging and driving efficiency across healthcare systems.

Medical image analysis Healthcare

CXGenie

33.9K

CXGenie is an innovative AI-driven platform designed to streamline customer support processes and enhance operational efficiency for businesses.

customer support AI Chatbot

Cyanite.ai

155.4K

Cyanite.ai streamlines your music management process by automatically creating metadata for songs and catalogs, significantly reducing the time and effort needed for organization.

AI for music tagging AI Product Description Generator

Find AI tools in YBX