Google Enhances Gemini Series: New Large and Small Model Updates Unveiled at Google I/O 2024

Google has introduced an impressive array of updates to its flagship Gemini model, including new versions ranging from small to large. The initial launch of Gemini took place at last year’s I/O event, aimed at enhancing Google's services with a powerful foundation model. In a significant update this year, the latest iteration, Gemini 1.5 Pro, is now available to developers worldwide, expanding access beyond its previous limited availability.

During the announcement, Google CEO Sundar Pichai shared that Gemini 1.5 Pro incorporates a variety of enhancements in translation, coding, and reasoning, all driven by feedback from its initial rollout. This model supports 35 languages and is multimodal, enabling it to interpret images, text, and visual prompts effectively. Notably, Gemini 1.5 Pro features an extensive context window of up to 1 million tokens, which allows it to process roughly 1,500 pages of text. Looking ahead, Google revealed plans to further increase this capacity, extending the context window up to 2 million tokens—equivalent to around 1.5 million words. For comparison, OpenAI's GPT-4 Turbo manages only 128,000 tokens.

At the I/O event, developers shared insights about their experiences using Gemini 1.5 Pro with lengthy academic research papers, highlighting the model's ability to handle substantial input seamlessly. However, the 2 million-token version is currently available exclusively to select developers in a private preview. Pichai expressed excitement about the rapid advancements made in just a few months, stating, “This represents the next step on our journey towards the ultimate goal of infinite context. [Multimodality and long context] is powerful on its own, but together they unlock deeper capabilities and more intelligence.”

In addition to the flagship model, Google announced a more compact version, Gemini 1.5 Flash. This lightweight model is tailored for low-latency environments, catering to applications that demand quick response times, such as IoT devices and industrial robotics. Sir Demis Hassabis, the CEO of Google DeepMind, revealed Gemini 1.5 Flash at the I/O event, emphasizing its fast, cost-efficient design that retains Gemini’s robust context handling and multimodal reasoning capabilities.

Gemini 1.5 Flash will be accessible through Google’s AI Studio and Vertex, alongside Gemini 1.5 Pro, both set to launch in June.

Further announcements included PaliGemma, an open-source vision-language model designed for generating image captions and labels. This nimble model effectively processes both images and text inputs, producing detailed responses about visual content. Google also introduced Gemma 2, the latest small language model, launching in June. Designed for developers and businesses with limited infrastructure capabilities, Gemma 2 operates efficiently on a single TPU—Google's custom hardware—while boasting 7 billion parameters, significantly surpassing the 2 billion parameters of earlier Gemma models. Remarkably, this new version outperforms models that are more than double its size, showcasing Google’s commitment to innovation and efficiency in AI development.

Most people like

Find AI tools in YBX