OpenAI Introduces Voice Cloning AI Model, Currently Available Only to Select Partners

OpenAI is expanding its reach beyond text, image, and video generation with a significant advancement in audio technology: voice cloning. Today, the company announced its latest AI model, the “Voice Engine.” In development since 2022, this model powers OpenAI's text-to-speech API, as well as the new ChatGPT Voice and Read Aloud features introduced earlier this month.

How Voice Cloning Works

The Voice Engine can create realistic voice clones by having a human speaker record a 15-second audio clip through a phone or computer microphone. The AI then generates natural-sounding speech that closely resembles the original speaker, allowing users to convert any typed text into spoken words.

Major Implications for the Spoken Audio Market

This technology holds enormous potential for individuals who frequently speak publicly, including podcasters, voice-over artists, audiobook narrators, gamers, and customer service representatives. Furthermore, it challenges competing companies in this space, such as ElevenLabs, Captions, Meta, WellSaid Labs, and MyShell.

OpenAI also emphasizes Voice Engine's ability to assist non-verbal individuals by providing unique, non-robotic voices, which can be instrumental in therapeutic and educational settings for those with speech impairments or learning challenges.

Initial Use Cases

In its announcement, OpenAI noted that Voice Engine is currently accessible to a small group of trusted partners, including:

- Age of Learning: Uses Voice Engine and GPT-4 to create personalized voice content for diverse student audiences.

- HeyGen: Employs the technology for video translation, creating custom avatars with real-sounding multilingual voices to enhance global communication.

- Dimagi: Integrates Voice Engine to deliver interactive, multilingual feedback for community health workers, improving service delivery in remote areas.

- Livox: Enhances its AAC app with Voice Engine, providing unique voices for individuals with speech and hearing disabilities.

- Norman Prince Neurosciences Institute at Lifespan: Uses the technology to assist patients with speech impairments, notably helping to restore the voice of a brain tumor patient based on a prior audio sample.

OpenAI has provided audio samples demonstrating the technology’s capabilities, including a comparison between a patient’s original voice and the cloned version using the Voice Engine.

Limited Access and Cautious Deployment

For now, the Voice Engine is not available to the general public. OpenAI is sharing insights and results from a small-scale preview strictly with its trusted partners. The company stated, “We are taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse." OpenAI aims to initiate discussions on the responsible use of synthetic voices and assess how society can adapt to these advancements.

OpenAI’s approach to releasing the Voice Engine is consistent with recent calls for regulations on AI voice impersonation. To ensure ethical use, partners testing the technology must adhere to strict policies prohibiting unauthorized impersonation and requiring informed consent from voice donors. Additionally, OpenAI is implementing safety measures, including watermarking and proactive monitoring, to promote responsible technology usage.

Most people like

Find AI tools in YBX