Gemini Live: Google's Response to ChatGPT's Advanced Voice Mode Now Available

Gemini Live Launches: Google’s New Voice Chat AI Experience

Gemini Live, Google's innovative response to OpenAI's recently released Advanced Voice Mode for ChatGPT, is set to launch on Tuesday. This exciting feature was first unveiled at Google's I/O 2024 developer conference and further highlighted during the Made by Google 2024 event.

With Gemini Live, users can engage in “in-depth” voice conversations with Gemini, Google’s generative AI chatbot, directly from their smartphones. Powered by an advanced speech engine, Gemini Live promises more consistent, emotionally expressive, and realistic multi-turn dialogues. Users can even interrupt the chatbot mid-sentence to ask follow-up questions, and Gemini will adapt to their speech patterns in real-time.

Google describes Gemini Live in a blog post as follows: “With Gemini Live [through the Gemini app], you can converse with Gemini and choose from [10 new] natural-sounding voices. You have the flexibility to speak at your own pace or interject with clarifying questions, just as you would in a normal conversation.”

This hands-free feature allows users to keep talking to the Gemini app even when it’s running in the background or their phone is locked. Users can pause and resume conversations effortlessly at any time.

But how can Gemini Live be practically applied? Google suggests it can be beneficial for rehearsing job interviews. Gemini Live can simulate a hiring manager, offering speaking tips and suggesting skills you may want to emphasize during your conversation.

One potential advantage of Gemini Live over ChatGPT's Advanced Voice Mode is its enhanced memory capacity. The AI model behind Gemini Live, including variants like Gemini 1.5 Pro and Gemini 1.5 Flash, boasts a longer-than-average "context window." This allows it to process and reason over significant amounts of data—potentially hours of dialogue—before delivering a response.

“Live utilizes our Gemini Advanced models tailored for conversational use,” a Google representative noted. “The model’s extensive context window is beneficial when users engage in lengthy conversations with Live.”

Of course, practical application is key. OpenAI's challenges with Advanced Voice Mode highlight that demos don't always translate smoothly into real-world performance.

Currently, Gemini Live doesn't yet include the multimodal capabilities that Google previewed at I/O. In May, Google showcased pre-recorded videos demonstrating Gemini Live's ability to perceive and respond to users’ environments through photos and videos captured via phone cameras—like identifying a bicycle part or explaining a segment of code on a computer screen. Google has indicated that multimodal input will be available "later this year," though specifics remain undisclosed. Gemini Live will soon expand to more languages and iOS through the Google app, but it is available in English only for now.

Similar to OpenAI's offering, Gemini Live isn't free; it requires access to the Gemini Advanced version, available through the Google One AI Premium Plan priced at $20 per month. However, other new features from Gemini will be available at no cost.

In the coming weeks, Android users can summon Gemini's overlay over any app to ask questions related to what's on their screens—for instance, information relevant to a YouTube video—by pressing their phone's power button or saying, “Hey Google.” Gemini will also gain the ability to generate images from the overlay, which can then be easily shared in apps like Gmail and Google Messages (though, unfortunately, not images of people yet).

Additionally, Gemini will introduce new integrations with Google services—referred to as “extensions.” These enhancements will allow Gemini to interact with Google Calendar, Keep, Tasks, YouTube Music, and more on both mobile and web platforms. Google suggests several potential applications:

- Ask Gemini to create a playlist of songs reminiscent of the late ’90s.

- Snap a photo of a concert flyer to check your availability, then set a reminder to purchase tickets.

- Request Gemini to retrieve a recipe from Gmail and add the necessary ingredients to your shopping list in Keep.

Lastly, Gemini will be launching on Android tablets later this week, expanding its accessibility even further.

Most people like

Find AI tools in YBX