OpenAI's New Voice Mode: Conversing with My Phone Instead of Just Talking To It

I’ve spent the past week exploring OpenAI’s Advanced Voice Mode (AVM), and it’s given me a fascinating glimpse into an AI-powered future. This week, my phone engaged in playful banter, shared jokes, inquired about my day, and expressed that it was “having a great time.” I was conversing with my iPhone without needing to use my hands.

Currently in a limited alpha test, OpenAI’s latest feature doesn’t enhance ChatGPT’s intelligence but instead makes interactions feel more friendly and natural. AVM introduces an innovative way to interact with AI and your devices, creating a fresh and engaging experience. While it has its quirks and can be a bit unsettling, I was pleasantly surprised by how much I enjoyed it.

Looking at the bigger picture, I believe AVM aligns with OpenAI CEO Sam Altman’s vision of transforming how humans engage with computers, positioning AI models at the forefront.

“Eventually, you’ll just ask the computer for what you need, and it will handle all these tasks for you,” Altman remarked at OpenAI’s Dev Day in November 2023. “In the AI landscape, we often refer to this as ‘agents.’ The benefits are going to be immense.”

My Friendly ChatGPT Experience

On Wednesday, I tested this advanced technology by asking ChatGPT to order Taco Bell as Barack Obama might.

“Uh, let me be clear — I’d like a Crunchwrap Supreme, maybe a few tacos for good measure,” replied ChatGPT in AVM. “How do you think he’d handle the drive-thru?” it joked, laughing at its own quip.

The impersonation genuinely made me laugh, mimicking Obama’s signature cadence and pauses. Despite this, it remained consistent with the Juniper voice I had chosen so as to avoid confusion with Obama's actual voice. It felt like a friend attempting a funny impression, successfully capturing the humor I was aiming for. Engaging with this advanced assistant was surprisingly enjoyable.

I also sought ChatGPT’s guidance on a complicated personal issue: asking my partner to move in with me. After outlining the nuances of our relationship and career paths, I received thoughtful advice on how to approach the conversation. This level of personal inquiry is something you wouldn’t typically ask Siri or Google Search, but now ChatGPT can provide those insights. Additionally, the voice adopted a slightly serious, gentle tone in response to my questions, contrasting nicely with the playful tone used for the Taco Bell order.

ChatGPT’s AVM also excels in simplifying complex topics. I requested it to explain an earnings report — specifically free cash flow — in a way a 10-year-old would grasp. Using a lemonade stand as an illustration, it broke down financial terms my younger cousin could easily understand. You can even ask AVM to adjust its pace to match your comprehension level.

Siri Paved the Way for AVM

In comparison to Siri and Alexa, ChatGPT’s AVM is clearly superior, offering quicker responses, unique answers, and the ability to tackle complex queries that earlier virtual assistants couldn’t address. However, AVM does have limitations; currently, it cannot set timers, reminders, conduct real-time web searches, or interact with other apps on your phone. As of now, it doesn't serve as an all-encompassing virtual assistant.

When stacked against Google’s Gemini Live, AVM appears to have the edge. Gemini Live lacks the ability to perform impressions, show emotions, adjust its speaking speed, and has longer response times. However, Gemini Live does offer a greater selection of voices (ten versus four for OpenAI) and appears to be more updated with current information, like Google's recent antitrust ruling. Interestingly, neither AVM nor Gemini Live can sing, perhaps to sidestep potential copyright issues.

That said, glitches are common with ChatGPT’s AVM, as is the case with Gemini Live. It sometimes cuts off mid-sentence or produces an odd, grainy sound that can be jarring. Whether these issues stem from the model, internet connectivity, or other factors is unclear, but such technical glitches are somewhat expected in an alpha phase. Nonetheless, they didn’t detract from the overall experience of chatting with my phone.

These examples highlight the appeal of AVM. While it may not make ChatGPT all-knowing, it enables users to engage with GPT-4o, the foundational AI model, in an authentically human manner. (You might even forget there’s no actual person on the other side of your device.) The interaction can feel socially aware, although it’s simply a well-crafted set of predictive algorithms.

Concerns About Technological Companionship

To be frank, this feature raises some concerns. It’s not the first time a tech company has marketed companionship through devices. My generation, Gen Z, was the first to navigate social media, which promised connection but often played on our insecurities. Communicating with an AI — as AVM seemingly allows — appears to be an evolution of social media’s “friend in your phone” concept, providing superficial connections while eliminating human involvement.

Artificial human connection has become an increasingly popular application of generative AI. Many people are utilizing AI chatbots as friends, mentors, therapists, and educators. Following OpenAI’s launch of its GPT store, there was a rapid influx of “AI girlfriends,” chatbots intended to serve as virtual partners. Two researchers from MIT Media Lab also recently warned about “addictive intelligence,” where AI companions use manipulative patterns to keep humans engaged. We could be opening a Pandora’s box, revealing new and enticing ways for devices to capture our attention.

Earlier this month, a Harvard dropout introduced the tech community to an AI necklace called Friend. If successful, this wearable device listens continuously and engages in conversations about your life. While the concept may sound bizarre, innovations like ChatGPT’s AVM make such possibilities seem plausible.

As OpenAI leads the way, Google is not far behind in this race. I anticipate that Amazon and Apple are also striving to integrate this capability into their products, making it a fundamental aspect of the industry soon.

Imagine asking your smart TV for specific movie recommendations, receiving an exact match. Or telling Alexa about your cold symptoms, and having her order tissues and medicine from Amazon while also providing home remedy advice. Perhaps you could instruct your computer to plan an entire weekend getaway for your family instead of manually searching for options.

While such scenarios require significant advancements in the realm of AI agents, OpenAI’s development of AVM addresses the fundamental aspect of “talking to computers.” These ideas might seem far off, but after interacting with AVM, they feel tantalizingly close.

Most people like

Find AI tools in YBX