OpenAI Launches Hyper-Realistic Voice Feature for Select ChatGPT Paying Users

OpenAI has begun the rollout of ChatGPT’s Advanced Voice Mode, providing users their first glimpse of GPT-4o's hyperrealistic audio capabilities. Currently available to a select group of ChatGPT Plus users, this alpha version is expected to gradually become accessible to all Plus subscribers by fall 2024.

When OpenAI first unveiled GPT-4o's voice in May, audiences were captivated by the rapid responses and its striking resemblance to a human voice—especially that of actress Scarlett Johansson, whose voice was likened to one of the demo voices named Sky. After the demonstration, Johansson turned down multiple requests from CEO Sam Altman to use her voice and engaged legal representation to protect her likeness. Although OpenAI denied using her voice, they subsequently removed it from the demo. In June, the company announced a delay in the release of Advanced Voice Mode to enhance safety measures.

Fast forward a month, and the wait for some features has ended (at least partially). OpenAI clarified that the video and screen-sharing functions showcased in their Spring Update will not be included in this alpha phase and will launch at a later time. While the impressive GPT-4o demo remains just that—a demo—certain premium users can now experience ChatGPT's voice feature firsthand.

ChatGPT Can Now Talk and Listen

If you have previously tried Voice Mode in ChatGPT, it’s important to note that OpenAI's Advanced Voice Mode operates differently. The earlier implementation relied on three distinct models: one for converting voice to text, GPT-4 for processing prompts, and another to transform ChatGPT's responses back into audio. In contrast, GPT-4o is a multimodal system that performs all these tasks seamlessly, resulting in notably lower latency during conversations. Additionally, OpenAI claims that GPT-4o can detect emotional tones in users' voices, including sadness, excitement, or even when singing.

During this pilot phase, ChatGPT Plus users will have the opportunity to experience the impressive capabilities of OpenAI’s Advanced Voice Mode.

OpenAI is gradually distributing ChatGPT’s new voice to monitor its usage closely. Participants in the alpha group will receive an alert through the ChatGPT app, followed by an email detailing how to utilize the feature.

Since its demo, OpenAI has tested GPT-4o's voice capabilities with over 100 external testers, representing 45 different languages. A report summarizing these safety efforts is expected to be released in early August.

The Advanced Voice Mode will feature only four preset voices—Juniper, Breeze, Cove, and Ember—developed in collaboration with professional voice actors. The Sky voice from the May demo is no longer available. OpenAI spokesperson Lindsay McCallum confirmed, "ChatGPT cannot impersonate the voices of individuals, including public figures, and will block any output that does not conform to these preset voices."

In an effort to avoid deepfake controversies, OpenAI is staying vigilant. Earlier this year, AI startup ElevenLabs faced backlash when its voice cloning technology was used to impersonate President Biden, misleading primary voters in New Hampshire.

Furthermore, OpenAI has implemented new filters to prevent certain requests that aim to generate music or other copyrighted audio. Over the past year, AI companies have encountered legal challenges regarding copyright infringement, and audio models like GPT-4o open up a new frontier for potential complaints, particularly from record labels known for their litigation history, such as those that have already pursued legal action against AI song generators like Suno and Udio.

Most people like

Find AI tools in YBX