Meta's Voicebox AI: The DALL-E of Text-to-Speech Technology

Home AI News Meta's Voicebox AI: The DALL-E of Text-to-Speech Technology

Updated on November 8 2024

Today marks a significant advancement toward a future where celebrity voices could be immortalized in technology. Meta has launched Voicebox, a groundbreaking generative text-to-speech model designed to transform audio generation similarly to how ChatGPT and Dall-E revolutionized text and image creation. Rather than producing text or images, Voicebox generates high-quality audio clips.

Meta describes Voicebox as “a non-autoregressive flow-matching model trained to infill speech, given audio context and text.” It has been trained on over 50,000 hours of unfiltered audio data, including recordings and transcripts from various public domain audiobooks in English, French, Spanish, German, Polish, and Portuguese. This diverse dataset enables the model to produce natural-sounding speech, enhancing conversational quality across different languages.

Research indicates that speech recognition models trained on Voicebox-generated synthetic speech perform nearly as well as those trained on real speech. Additionally, the degradation in performance for the generated speech is only 1 percent, compared to the 45 to 70 percent decline often seen with existing text-to-speech (TTS) models.

Voicebox excels in its ability to predict and infill speech segments based on surrounding audio context and transcripts. This capability allows it to seamlessly generate audio portions within existing recordings without requiring a complete re-recording. Moreover, it can actively edit audio clips by removing background noise and correcting mispronounced words. Users can identify and crop noisy segments, instructing the model to regenerate those parts—similar to how photo-editing software enhances images.

While text-to-speech generators have existed for some time—enabling creations like GPS navigation voices—modern solutions such as Speechify and ElevenLab’s Prime Voice AI typically demand extensive source material for accurate voice mimicry. Voicebox’s innovative zero-shot text-to-speech training method, known as Flow Matching, sets it apart by eliminating this need for vast data sets for each voice.

Benchmark results highlight Voicebox's superiority, outperforming current industry standards in intelligibility (with a word error rate of 1.9 percent versus 5.9 percent) and audio similarity (composite score of 0.681 compared to 0.580). Impressively, it operates up to 20 times faster than today's leading TTS systems.

However, it’s important to note that the Voicebox app and its source code will not be publicly released at this time, as Meta has expressed concerns over potential misuse, despite recognizing the promising applications of generative speech models.

Lawmakers Propose 'Blue-Ribbon Commission' to Analyze the Effects of AI Technologies

Google Delays Bard AI's EU Launch Due to Privacy Concerns

Most people like

uPass

134.4K

In today's rapidly evolving educational landscape, students face unique challenges when it comes to writing assignments and ensuring academic integrity. With the rise of artificial intelligence tools, it's crucial to have reliable AI detectors that can identify AI-generated content while also utilizing advanced AI rewriters that enable students to create original, high-quality work without detection. This powerful combination empowers learners to enhance their writing skills and maintain their academic integrity, all while navigating the complexities of modern education.

AI detector AI Rewriter

Clarice.ai

155.2K

Unlock your content potential with our AI writing assistant, designed to enhance your writing efficiency and improve the quality of your outputs. Experience faster, more effective content creation that captivates your audience and meets your goals effortlessly.

AI writing assistant Writing Assistants

SnapXam

339.3K

Introducing the AI-Powered Math and Physics Tutor: Your Ultimate Learning Companion Unlock your full potential in math and physics with our advanced AI-driven tutoring platform. Designed to provide personalized assistance, our AI tutor adapts to your learning style, helping you grasp challenging concepts and excel in your studies. Experience tailored lessons, instant feedback, and engaging practice problems that make complex topics easier to understand. Discover the future of education with the AI-powered tutor that transforms how you learn math and physics!

math solver Homework Helper

LongShot AI

71.8K

Introducing an AI platform designed to meet all your content creation needs! Whether you're a marketer, a blogger, or a business owner, this innovative solution streamlines your content generation, ensuring quality and efficiency. Discover how our AI-driven tools can enhance your creativity and productivity, making it easier than ever to produce engaging and impactful content tailored to your audience. Embrace the future of content creation today!

AI platform Writing Assistants

Find AI tools in YBX