Google's Most Impressive Gemini Demo Revealed as Fake

Google's new Gemini AI model is receiving mixed reviews following its highly publicized launch yesterday. Many users are expressing skepticism about the company's technology and ethical standards after discovering that the most captivating demonstration of Gemini was essentially staged.

The video titled “Hands-on with Gemini: Interacting with multimodal AI” has garnered over a million views in just one day, and it’s easy to see why. This impressive demo showcases what Google describes as various interactions with Gemini, which is designed to understand and integrate both language and visual inputs seamlessly.

To kick things off, the video showcases Gemini narrating the transformation of a simple squiggle into a complete drawing of a duck. It humorously claims the duck is an unrealistic color and reacts with surprise (“What the quack!”) upon seeing a toy blue duck. Following this, it answers voice queries regarding the toy before moving on to other impressive functionalities, such as tracking a ball in a cup-switching game and recognizing shadow puppet gestures.

The responsiveness appears noteworthy, although the video wisely cautions that “latency has been reduced and Gemini outputs have been shortened.” This suggests that certain hesitations and lengthy responses were intentionally cut, which inevitably raises questions about the model's true performance. My initial disbelief concerning Google's capability to produce a real contender was challenged after watching the demo.

However, there is one significant issue: the video is misleading. Bloomberg's Parmy Olson was the first to expose that it was created by compiling footage to test Gemini’s abilities across a range of challenges. The team then prompted Gemini using still images and text, rather than showcasing real-time interactions.

While Gemini can indeed perform some of the tasks depicted in the video, it seems to fall short of delivering them live as suggested. Instead, the interactions were fabricated from carefully selected text prompts paired with still images, leading to misrepresentation. A related blog post outlines some of the actual prompts and responses, which, to be fair, is included in the video description, albeit in a less visible section.

On one level, it’s true that Gemini generated the responses presented in the video. But the portrayal of speed, accuracy, and the nature of interaction is misleading. For example, at the 2:45 mark, a hand gestures silently, and Gemini quickly claims, “I know what you’re doing! You’re playing Rock, Paper, Scissors!” Yet, the actual documentation indicates that Gemini cannot deduce information from individual gestures; it needs all three gestures presented at once along with a prompt.

This presents a clear disparity in how the interaction is framed: one suggests a spontaneous, intuitive understanding while the other reveals a methodical process constrained by limitations. Gemini engaged in the latter rather than the former, with the purported interaction depicted in the video never genuinely occurring.

Later, three sticky notes featuring doodles of the Sun, Saturn, and Earth are placed on a surface. When asked, “Is this the correct order?” Gemini responds correctly, “No, the correct order is Sun, Earth, Saturn.” However, the actual written prompt was more involved, requesting an explanation based on the distance from the Sun.

Did Gemini truly understand the planets on its own, or did it require additional guidance to reach that conclusion? In the video, a paper ball is cleverly swapped under a cup, which Gemini follows intuitively. But in reality, the model needs explicit instructions to perform such tasks, highlighting a further disconnect.

While these examples might seem minor, they raise valid concerns about the model’s effectiveness. Recognizing a hand gesture as a game quickly is indeed impressive, as is determining the essence of a half-finished drawing. Yet, given the lack of transparency surrounding the duck sequence, doubts about its authenticity linger.

If the video had clarified at the outset that it was a stylized representation of interactions tested by researchers, viewers might have accepted it without question. Instead, titled “Hands-on with Gemini” and describing various “favorite interactions,” it gives the impression that the presented interactions were genuine. They were not; many were altered, overly simplified, or entirely fictitious.

Should we have assumed Google was merely providing a taste of its capabilities? This raises concern about whether all demonstrations of Google AI are exaggerated for impact. When I initially labeled the video as “faked,” I questioned whether that description was justified. A Google spokesperson urged me to reconsider, but despite containing genuine elements, the video ultimately fails to convey the reality of the interactions—it's misleading.

Google asserts that the video represents real outputs from Gemini, which is partly true. Claiming they made a few edits and were transparent doesn't hold up. It wasn’t a genuine demo, and the depicted interactions differ significantly from the authentic interactions that informed the video.

Update: Following the publication of this article, Oriol Vinyals, VP of Research at Google DeepMind, elaborated on how “Gemini was used to create” the video. He mentioned that the video illustrates the potential user experiences individuals could build with Gemini, while stressing that it was meant to inspire developers. Interestingly, he shared a pre-prompting sequence that allows Gemini to answer the planets question without hinting.

Perhaps I’ll need to reassess my stance when the AI Studio featuring Gemini Pro becomes available for public exploration next week. Gemini may indeed evolve into a robust AI platform that competes with OpenAI and others. However, Google’s approach has certainly damaged its credibility. With this incident, it raises the question: how can users place their trust in Google’s claims about the model's capabilities? This misstep further distances the company from catching up with its competitors.

Most people like

Find AI tools in YBX