“Google Gemini 1.5 Pro Now Available for Public Preview on Vertex AI”

Home AI News “Google Gemini 1.5 Pro Now Available for Public Preview on Vertex AI”

Updated on October 23 2024

Google Unveils Gemini 1.5 Pro: A Major Leap for Generative AI on Vertex AI

Google has officially launched Gemini 1.5 Pro in public preview on Vertex AI, its leading platform for enterprise-focused AI development. This exciting announcement was made during the Cloud Next conference held in Las Vegas this week.

First introduced in February, Gemini 1.5 Pro is part of Google's Gemini family of generative AI models. One of its standout features is its massive context processing capability, which ranges from a remarkable 128,000 tokens to an astounding 1 million tokens. Here, "tokens" refer to broken-down units of data. For instance, the word "fantastic" consists of the tokens "fan," "tas," and "tic."

To put 1 million tokens into perspective, it equates to roughly 700,000 words or around 30,000 lines of code. This is about four times the input capacity of Anthropic's flagship model, Claude 3, and eight times the input capability of OpenAI's GPT-4 Turbo.

In AI terminology, a model’s context, or context window, designates the initial data set it considers before generating further output. This can range from simple questions, like “Who won the 2020 U.S. presidential election?” to more complex forms of content including movie scripts, emails, essays, or entire e-books.

Models with limited context windows often struggle to retain earlier parts of conversations, which can cause them to stray off-topic. In contrast, models with expansive context windows, such as Gemini 1.5 Pro, maintain coherence and produce richer, more contextually aware responses. This capability may also lessen the need for extensive fine-tuning and factual verification.

So, what can users achieve with a 1 million-token context window? Google asserts that the possibilities are extensive, ranging from analyzing complex code libraries and navigating lengthy documents to engaging in prolonged conversations with a chatbot.

Gemini 1.5 Pro is not just multilingual; it is also multimodal, enabling it to understand and analyze images, videos, and, as of Tuesday, audio streams alongside text. This functionality allows the model to compare and analyze diverse media formats—including TV shows, movies, radio broadcasts, and conference call recordings—across different languages. To give a better idea, 1 million tokens correspond to about an hour of video or roughly 11 hours of audio.

With its audio processing prowess, Gemini 1.5 Pro can also provide transcriptions for video clips, although the accuracy of these transcriptions remains to be fully evaluated.

In a recorded demo earlier this year, Google showcased Gemini 1.5 Pro's capabilities by searching through a 400-page transcript of the Apollo 11 moon landing telecast to find humorous quotes, and even locating scenes in movie footage that resembled a pencil sketch.

Early adopters of Gemini 1.5 Pro, including United Wholesale Mortgage, TBS, and Replit, are harnessing the power of its extensive context window for a variety of applications. These include mortgage underwriting, automating metadata tagging for media archives, and generating, explaining, and transforming code.

However, it's important to note that processing 1 million tokens isn't instantaneous. In the demonstrations, searches ranged from 20 seconds to a minute, which is considerably longer than a typical query on ChatGPT. Google has acknowledged this latency issue and is committed to optimizing Gemini 1.5 Pro over time.

Interestingly, Gemini 1.5 Pro is gradually being integrated into other Google corporate products. On Tuesday, the company revealed that the model (currently in private preview) would enhance features in Code Assist, Google's generative AI coding tool. This advancement allows developers to implement "large-scale" changes across codebases, including updates to cross-file dependencies and thorough code reviews.

Keywords: AI, Enterprise, Gemini, Gemini 1.5 Pro, Google, Google Cloud, Vertex AI, generative AI, audio processing, model capabilities

Google Unveils Open Source Tools for Enhanced AI Model Development

Streamlining Agent Creation: Google Cloud's Vertex AI Agent Builder Simplifies Development

Most people like

Imagine AI Art Generator

Explore the captivating world of AI-generated art created from text prompts. Discover how artificial intelligence transforms written words into stunning visual masterpieces, blending creativity and technology in innovative ways. This fascinating intersection not only enhances artistic expression but also opens up new avenues for creativity, making art more accessible to everyone. Join us as we delve into this exciting realm where imagination meets algorithm.

AI art AI Art Generator

Airparser

Transform your data extraction process with our cutting-edge AI-powered parser. Unlock the power of artificial intelligence to streamline and enhance how you gather and analyze data efficiently.

data extraction AI Document Extraction

CoeFont

Enhance Your Content with AI-Powered Voices for Engaging Experiences.

AI Voice Text-to-Speech

Spicytool

Discover the ultimate AI-driven solution for crafting and enhancing your Google Ads campaigns. This powerful tool revolutionizes ad creation and optimization, empowering businesses to maximize their online advertising impact.

AI-powered ads AI Advertising Assistant

Find AI tools in YBX