OpenAI Hints at Exciting New Voice Engine While Delaying Full Release

OpenAI has recently provided a preview of its innovative text-to-audio AI model, known as Voice Engine, but it has not yet been released to the public. This advanced model transforms text inputs into natural, expressive speech and can produce audio in a language different from the input, demonstrating its capability to bridge linguistic gaps—transforming English text into Spanish audio, for example. In a recent blog post, OpenAI showcased brief 15-second audio samples, highlighting the model’s ability to generate emotive and lifelike voices.

Development of Voice Engine began in late 2022, and it has already been integrated into preset voices offered through OpenAI’s text-to-speech API and the voice functionality of ChatGPT. However, the company is exercising caution regarding its release, citing concerns over potential misuse of synthetic voice technology. A statement from OpenAI emphasized the importance of initiating discussions about the responsible deployment of synthetic voices and how society can adapt to these emerging capabilities. They plan to make informed decisions about wider deployment based on feedback from these discussions and results from limited test applications.

Last week, OpenAI hinted at the Voice Engine's capabilities by filing a trademark application for a service mark, signaling its intention to offer voice-related services. Currently, a select group of partner companies is testing Voice Engine, utilizing it for various applications such as providing reading assistance for children and non-readers, translating video and podcast content, and supplying voices for avatars in sales demonstrations. These partner firms are subject to strict usage guidelines; they may not use Voice Engine to impersonate individuals or create custom voices. OpenAI has instituted a “no-go voice list” to prevent the generation of audio that closely resembles that of notable figures.

Before a broader rollout, OpenAI has advised financial institutions to reconsider using voice-based security authentication, as voice AI systems have been exploited to bypass such security measures. Notably, in 2021, scammers managed to trick an Emirati bank manager into losing $35 million by cloning customer voices. To combat misuse, OpenAI has implemented watermarking technology that identifies audio generated by Voice Engine, aiming to create more solutions to trace the source of audio content.

The company acknowledged the significant risks associated with generating speech that mimics individuals’ voices, particularly in sensitive contexts like election years. OpenAI is actively engaging with a broad range of stakeholders—including representatives from government, media, entertainment, education, and civil society—to gather feedback as it further develops this technology. A dedicated team is responsible for vetting AI models for safety prior to deployment, with the company's board retaining the authority to reverse decisions significantly impacting safety measures.

Most people like

Find AI tools in YBX