Unlocking the Future: MyShell Launches OpenVoice, the New Open Source AI Voice Cloning Model

Home AI News Unlocking the Future: MyShell Launches OpenVoice, the New Open Source AI Voice Cloning Model

Updated on October 30 2024

Startups like ElevenLabs have invested millions in developing proprietary algorithms and AI software for voice cloning, which creates audio programs that replicate users' voices.

Now, researchers from the Massachusetts Institute of Technology (MIT), Tsinghua University in Beijing, and members of AI startup MyShell introduce OpenVoice—an open-source voice cloning solution that boasts nearly instantaneous results and offers granular controls not found in other platforms.

“Clone voices with unparalleled precision, adjusting tone, emotion, accent, rhythm, pauses, and intonation from just a small audio clip,” states MyShell in their recent post on X.

The company shared a link to their research paper detailing the development of OpenVoice, along with access points for users to try it: the MyShell web app (user account required) and HuggingFace (public access without an account).

In an email, lead researcher Zengyi Qin from MIT and MyShell emphasized the project's goal: "MyShell aims to benefit the research community. OpenVoice is just the beginning. In the future, we will provide grants, datasets, and computing power to support open-source research. Our core mission is ‘AI for All.’”

Regarding the motivation behind OpenVoice, Qin explained: “Language, vision, and voice are three key modalities for future Artificial General Intelligence (AGI). While there are various open-source models for language and vision, a powerful, instant voice cloning model for customization was lacking, which is why we undertook this project.”

Using OpenVoice

In informal tests using HuggingFace, I quickly generated a convincing—if somewhat robotic—replica of my voice using random speech. Unlike other voice cloning applications, OpenVoice allowed me to speak freely without adhering to a specific script. In mere seconds, I had a voice clone that accurately read back my text prompt.

Additionally, I could adjust the "style" of the clone among different emotional presets, such as cheerful, sad, or angry, effectively changing the tone.

Here’s a sample of my voice clone using OpenVoice set to a "friendly" tone.

How OpenVoice was Created

The creators of OpenVoice—Qin, Wenliang Zhao and Xumin Yu from Tsinghua University, and Xin Sun from MyShell—outlined their method in their research paper. OpenVoice consists of two key AI models: a text-to-speech (TTS) model and a tone converter.

The TTS model manages style parameters and languages, trained on 30,000 sentences from two English speakers (with American and British accents), one Chinese speaker, and one Japanese speaker, each labeled with specific emotions. It learned nuances like intonation, rhythm, and pauses.

The tone converter was trained on over 300,000 audio samples from more than 20,000 speakers. Audio from spoken language is converted into phonemes—distinct sounds that differentiate words—and represented as vector embeddings.

By utilizing a "base speaker" for the TTS model, in combination with tone information from user input, these models can replicate the user’s voice and adapt its emotional expression. The diagram in the OpenVoice research illustrates how these models integrate.

Despite the conceptual simplicity, this method is efficient and requires significantly fewer computing resources than competitors like Meta's Voicebox.

Qin shared, “We aimed to develop the most flexible instant voice cloning model. This flexibility means control over styles, emotions, accents, and adaptability to any language. Previously, such comprehensive functionality was unattainable due to its complexity. Through a decoupled pipeline process, we achieved effective outcomes with simplicity.”

Behind OpenVoice

MyShell, established in 2023 with a $5.6 million seed round led by INCE Capital alongside contributions from Folius Ventures, Hashkey Capital, SevenX Ventures, TSVC, and OP Crypto, has already garnered over 400,000 users, as reported by The SaaS News. While researching, I observed over 61,000 users on their Discord server.

MyShell describes itself as a “decentralized and comprehensive platform for discovering, creating, and staking AI-native applications.” Besides OpenVoice, their web app features various text-based AI characters and bots with distinct personalities, akin to Character.AI, and includes tools such as an animated GIF maker and user-generated RPGs based on popular franchises.

As for monetization, MyShell charges a monthly subscription for web app users and for third-party bot creators wishing to promote their products within the app. They also charge for AI training data.

Correction: Thursday, January 4, 2023 – The piece was updated to clarify that MyShell is not based in Calgary, AB, Canada.

Transforming Banking: Elevating Digital Customer Journeys Beyond Simple Transactions

Supreme Court Year-End Report Explores the Future of AI in the Judicial System

Most people like

Baked Studio

Are you a startup looking to elevate your brand with exceptional design? A design subscription can provide you with ongoing access to professional creative services tailored to your evolving needs. This innovative approach not only saves you time and money but also ensures that your brand stays fresh and competitive in today’s fast-paced market. Discover how a design subscription can be the game-changer your startup needs to visually captivate your audience and drive growth.

product design Design Assistant

insMind

Elevate your product images with our advanced AI photo editor designed specifically for image enhancement. Transform ordinary photos into stunning visuals that captivate your audience and drive sales. Discover the power of AI technology to improve clarity, color, and detail in your product images effortlessly.

AI photo editing AI Photo & Image Generator

Voxify

Effortlessly convert text to speech using our advanced AI voice generator. Experience natural-sounding audio and enhance your content today!

AI voice generator Text-to-Speech

Flux AI Pro

AI Image Generator: Create Stunning High-Quality Images from Text Prompts.

AI image generator AI Art Generator

Find AI tools in YBX