Gemini Live Needs Additional Rehearsals for Enhanced Performance

What’s the benefit of interacting with a human-like chatbot if it lacks reliability and personality?

This question has been on my mind since I started exploring Gemini Live last week, Google’s answer to OpenAI’s Advanced Voice Mode. Designed to offer a more engaging chatbot experience, Gemini Live features realistic voices and enables users to interrupt the bot at any moment.

“Gemini Live is custom-tuned to facilitate intuitive, back-and-forth conversations,” Sissie Hsiao, GM for Gemini experiences at Google, shared in May. “It can provide information concisely and engage in a more conversational manner than traditional text interactions. An AI assistant should solve complex problems while feeling natural and fluid in conversation.”

After extensive use of Gemini Live, I found it to be more fluid and natural compared to Google’s past voice interactions, like Google Assistant. However, it still struggles with some core tech issues, such as hallucinations and inconsistencies, while adding new challenges.

A Step Forward, But Not Quite There

At its core, Gemini Live is an advanced text-to-speech engine built on Google’s latest generative AI models, Gemini 1.5 Pro and 1.5 Flash. This system generates speech from text, with a conversation transcript just a swipe away within the Gemini app, available on Android and soon on iOS.

For my Pixel 8a, I chose the voice Ursa, which Google describes as "mid-range" and "engaged" — it sounded like a younger woman to me. Google worked with professional actors to develop ten distinct voices for Gemini Live, improving expressiveness from older synthetic voices, especially compared to the standard Google Assistant voice.

Yet, Ursa and its counterparts deliver a dispassionate tone that avoids the uncanny valley. This might be intentional, as users cannot adjust pitch, timbre, or speed, making it less versatile than Advanced Voice Mode. Unlike Advanced Voice Mode’s dynamic features, such as laughter or natural pauses, Gemini Live maintains a steady, polite demeanor, which can feel distant.

Testing Gemini Live for Job Prep

When Google showcased Gemini Live at its I/O developer conference, they suggested it could aid in job preparation. I decided to test this out, sharing that I was applying for a tech journalism position. The bot asked for details about the specific role I was considering and followed up with a mix of generic and personalized practice questions.

I provided succinct responses and sought feedback from Gemini Live. The chatbot’s compliments felt a bit hollow: “Based on our practice, it sounds like you have a solid grasp of the challenges and rewards of tech journalism.”

Skeptical, I played devil’s advocate, implying my answers were one-word responses. Gemini Live took the bait and responded, “That’s true, you mostly gave one-word answers.” This ability to confidently create incorrect narratives found in my interactions with Gemini Live often led to trust issues.

Unreliable Recommendations

Gemini Live remembers a surprising amount from earlier chat sessions, often recalling information. However, when asked about specific places or events, its tendency to hallucinate became apparent. On a recent Friday evening, I sought budget-friendly activities in New York City. It suggested the infamous nightclub Le Bain, a rooftop bar called The Skylark, and a now-closed club named Output.

Excited, I searched for details. While Le Bain had its merits, Output closed its doors in 2019. Gemini Live acknowledged this only after I pointed it out, adding, “Thank you for catching that outdated information.”

I then shifted to a different game setup suggested by the bot. While proposing a word game, Gemini Live inaccurately claimed that “quiet” could be formed from the letters in “cloud.” Disappointed, I asked if it had any spicy opinions.

“The increasing focus on mental health awareness does more harm than good,” Gemini Live asserted, inviting a lively discussion. Yet, when questioned further, it flip-flopped and claimed mental health awareness is a positive development. This inconsistency reflected Gemini Live's frustratingly vague nature.

A Mixed Bag of Responses

Gemini Live's dual perspectives on mental health highlighted its non-specific responses. Even when grounded in factual information, its suggestions often felt generic. For instance, it suggested that I focus my interview preparation but later provided vague advice typical of college career fairs.

In tackling current events, the bot was mostly accurate but overly verbose, resulting in tedious, lengthy responses. Occasionally, it shied away from sensitive topics, such as political discussions, leaving me to feel that Gemini Live still had limitations.

Moreover, navigating conversations with Gemini Live was sometimes awkward. If I interrupted, the bot would quiet its voice while continuing to speak, creating a disorienting experience. This design flaw underscores the need for smoother interactions.

Technical Issues Abound

Unfortunately, Gemini Live is not without its technical hiccups. For starters, activating it required navigating somewhat convoluted instructions found on Reddit—an unnecessary inconvenience.

During my interactions, I frequently experienced the voice cutting out mid-sentence, necessitating multiple requests for a repeat. Sometimes, Gemini Live didn’t register my input, compelling me to tap the “Pause” button repeatedly to gain the bot’s attention.

Currently, it also lacks many integrations available to Google’s text-based Gemini chatbot, such as summarizing emails or managing playlists on YouTube Music.

Ultimately, Gemini Live feels like an underdeveloped tool that struggles to provide useful information or engaging conversation. After several days of use, I’m left questioning its practical value — especially considering it’s exclusive to Google’s $20-per-month Google One AI Premium Plan. However, upcoming features that may include image interpretation and real-time video capabilities could enhance its utility.

As it stands, Gemini Live serves as a subpar alternative compared to the text-based experience, which remains more practical and reliable.

In reflection, Gemini Live provided critical feedback of my interactions, highlighting that my responses were often brief and lacked elaboration. It also noted how unpredictable topic shifts hindered our dialogue. Fair enough, Gemini Live, fair enough.

Most people like

Find AI tools in YBX