OpenAI Introduces Voice Feature for ChatGPT: Enhancing Conversational AI Interaction

ChatGPT is transforming into more than just a text-based search engine, as OpenAI announces the integration of innovative voice and image capabilities. This widely-used generative AI assistant has rapidly become one of the standout technology achievements in recent months, enabling users to create essays, poems, and summaries from straightforward text prompts. Now, ChatGPT is set to become even more engaging, allowing users to converse audibly with the chatbot.

This announcement coincides with Amazon's decision to invest up to $4 billion in OpenAI competitor Anthropic, highlighting the intensifying competition in the generative AI sector. Major tech players like Google, with its Bard chatbot, are striving to catch up, while Meta pursues a robust open-source approach, and Microsoft deepens its partnership with OpenAI.

A New Era of Interactivity

Today signifies a significant advancement for generative AI, as OpenAI merges the utility of voice assistants with their advanced large language models (LLMs). For example, users can now ask ChatGPT to create a spontaneous bedtime story by providing vocal cues or simply pose a question and receive a spoken response.

Additionally, ChatGPT users can conduct image-based inquiries. They can upload pictures and ask the AI to explain what it depicts or provide guidance on achieving specific tasks.

The voice feature utilizes a cutting-edge text-to-speech model that can produce lifelike voices from just a few seconds of audio samples. OpenAI collaborated with professional voice actors to develop five unique voices, while leveraging its open-source Whisper speech recognition system to convert spoken words into text.

Spotify has also emerged as a launch partner, unveiling a new feature that allows podcasters to sample their voices and translate their shows from English into Spanish, French, or German—while preserving their original voice. However, OpenAI is exercising caution in the rollout, having specifically collaborated with select podcasters, such as Dax Shepard, Monica Padman, Lex Fridman, Bill Simmons, and Steven Bartlett, to ensure responsible use.

“The new voice technology — which can generate realistic synthetic voices from minimal authentic speech — opens up numerous creative and accessibility-focused possibilities,” the company shared in a blog post. “Yet, these advancements also pose new risks, including the potential for malicious impersonation of public figures or fraud.”

Over the next two weeks, the new features will gradually become available to paying Plus and Enterprise subscribers. To enable voice functionalities, users must navigate to the “settings” menu in the app, select “new features,” and opt-in for voice conversations. Subsequently, they can tap the headphone button in the top-right corner to choose their preferred voice.

Initially, voice capabilities will be limited to the ChatGPT apps on Android and iOS in an opt-in beta phase, while image search will be accessible across all platforms by default.

Most people like

Find AI tools in YBX