Why Google’s Gemini Falls Short of Expectations as a Generative AI Model

Google has officially launched its much-anticipated next-generation generative AI model, Gemini, albeit in a limited form. This week, the company introduced Gemini Pro, which serves as a lighter version of the more advanced Gemini model expected to debut sometime next year.

In a virtual press briefing, the Google DeepMind team gave an overview of Gemini, technically referred to as "Gemini 1.0." Gemini is not just a single model; it consists of three variants:

1. Gemini Ultra: The flagship model that promises comprehensive capabilities.

2. Gemini Pro: A streamlined version aimed at broader user accessibility.

3. Gemini Nano: Designed for mobile devices like the Pixel 8 Pro, with two sizes available—Nano-1 (1.8 billion parameters) and Nano-2 (3.25 billion parameters)—to cater to varying memory capacities.

The most accessible way to experience Gemini Pro is through Bard, Google’s alternative to ChatGPT. As of today, Bard in the U.S. (in English only) runs on a fine-tuned version of Gemini Pro, which is reported to enhance reasoning, planning, and comprehension skills compared to Bard's previous model. However, independent verification of these improvements is not possible, as Google did not provide any prior testing of the models for journalists.

Starting December 13, Gemini Pro will launch for enterprise users via Vertex AI, Google’s managed machine-learning platform, before becoming available in Google’s Generative AI Studio for developers. Users have already noticed different versions of the Gemini model appearing in Vertex AI's model garden, further indicating its rollout. In the coming months, Gemini will also be integrated into products like Duet AI, Chrome, Ads, and Google Search as part of the Search Generative Experience.

Gemini Nano will be available soon in its preview phase through Google's recently released AI Core app, initially exclusive to Android 14 on the Pixel 8 Pro. Developers interested in integrating this model can sign up for early access. Gemini Nano will enable features in the Pixel 8 Pro and future Android devices, such as summarizing content in the Recorder app and generating suggested replies for supported messaging platforms like WhatsApp.

Natively Multimodal

While Gemini Pro shows promise, its capabilities are modest at best. According to Sissie Hsiao, GM of Google Assistant and Bard, Gemini Pro claims to outperform OpenAI’s GPT-3.5 across six benchmarks—including one (GSM8K) assessing elementary math reasoning. However, GPT-3.5 is now over a year old, making it a less-than-daunting opponent.

What about the more advanced Gemini Ultra? This model also aims for "natively multimodal" functionality, having been pre-trained and fine-tuned on a diverse range of data types, including code, text, audio, images, and videos. Collins, VP of product at DeepMind, asserts that Gemini Ultra understands complex information in various modalities and can tackle subjects like math and physics effectively.

Compared to OpenAI's multimodal model, GPT-4 with Vision, Gemini Ultra has additional capabilities, such as transcribing speech and answering questions related to audio and video content. Collins explained that while many multimodal models train separate components for each data type, Gemini is designed to process multiple modalities simultaneously.

Unfortunately, specifics regarding the training datasets remain elusive. Google has been tight-lipped about how it collected the data for Gemini, raising questions about whether creators whose work contributed to the model will have any rights to opt out or receive compensation. Collins disclosed that part of the training data was sourced from public websites and filtered for quality, but didn’t address the crucial issues of copyright and creator compensation.

The Rushed Launch

The launch of Gemini appears somewhat rushed. Google previously boasted that Gemini would bring “impressive multimodal capabilities” that were previously unseen and improve efficiency in integrating tools and APIs. Yet, the recent briefing failed to provide convincing evidence of these claims.

Since the launch of Bard in February, which garnered criticism for its inaccuracies, Google has been playing catch-up in the generative AI landscape. Bard has seen improvements, but Gemini's development has faced challenges, particularly with non-English queries. Gemini Ultra is set to be available first to select customers and developers, with broader access anticipated next year.

Despite challenges, Google remains committed to enhancing its products with AI-driven features, leveraging homegrown models like PaLM 2 and Imagen. However, whether Gemini Ultra can fulfill its ambitious promises—or if it too will face scrutiny—remains uncertain.

In this context, it's possible that lofty marketing expectations overshadow the actual capabilities of Gemini, raising questions about whether producing state-of-the-art generative AI models is as challenging as it seems, regardless of how one reorganizes their team.

Keywords: AI, Gemini, Generative AI, Google, DeepMind, multimodal models, machine learning.

Most people like

Find AI tools in YBX