OpenAI is rolling out its much-anticipated "ChatGPT Advanced Voice Mode," a humanlike conversational voice interface, expanding access beyond its initial testing group and waitlist. This feature will be available to all paying subscribers of OpenAI's ChatGPT Plus and Team plans, with gradual access starting in the U.S. over the next few days. Subscribers to the Edu and Enterprise plans can expect availability next week.
In addition to the voice interface, OpenAI is introducing the ability to store “custom instructions” and “memory” for personalized interactions, mirroring features previously released for ChatGPT’s text option. Users will enjoy five new voice styles—Arbor, Maple, Sol, Spruce, and Vale—complementing the existing voices: Breeze, Juniper, Cove, and Ember.
This enhancement allows ChatGPT users to engage with the chatbot through voice instead of typing. A popup notification will confirm when users enter the Advanced Voice Assistant mode in the app. OpenAI has invested time in refining accents for popular foreign languages and enhancing conversational fluidity since the alpha version. Users will also notice a redesigned Advanced Voice Mode featuring an animated blue sphere.
These updates are exclusive to the GPT-4o model, excluding the newer o1 preview model. Custom instructions and memory capabilities will further personalize user interactions during voice chats.
As AI voice assistants like Apple’s Siri and Amazon’s Alexa gain traction, developers strive to create more humanlike conversational experiences. ChatGPT has incorporated voice functionality with its Read-Aloud feature; however, Advanced Voice Mode aims to offer a more engaging and authentic interaction.
Among competitors, Hume AI recently launched its Empathic Voice Interface, which detects emotions through voice patterns, and Kyutai unveiled its open-source AI voice assistant, Moshi. Google has added voices to its Gemini chatbot, while Meta is developing voices mimicking popular actors for its AI platform. OpenAI claims it is making AI voice technology more accessible than its competitors.
Despite excitement, the integration of AI voices hasn't been without controversy. Concerns emerged regarding the similarity of one of ChatGPT's voices, Sky, to actress Scarlett Johansson's voice, particularly after CEO Sam Altman referenced "her," reminiscent of Johansson's role as an AI assistant in a film. OpenAI has emphasized that it does not intend to replicate the voices of well-known individuals and maintains that users will have access only to nine distinct voices from OpenAI.
The rollout was initially delayed from a projected late June launch to “late July or early August,” partially due to a commitment to safety testing. OpenAI conducted extensive evaluations with external red teamers fluent in 45 languages across 29 regions. The decision to expand access now suggests OpenAI feels confident in the safety measures implemented, aligning with its cautious approach of collaborating with U.S. and U.K. governments and providing previews of new models prior to release.