Google's Generative AI Now Capable of Analyzing Extensive Video Content Hours Efficiently

Home AI News Google's Generative AI Now Capable of Analyzing Extensive Video Content Hours Efficiently

Updated on October 23 2024

Gemini, Google’s innovative suite of generative AI models, has significantly expanded its capabilities, now maintaining the ability to analyze lengthy documents, codebases, videos, and audio recordings more effectively than ever.

During a keynote at the Google I/O 2024 developer conference on Tuesday, Google introduced a private preview of Gemini 1.5 Pro, its flagship model upgraded to handle an impressive 2 million tokens—double the previous limit.

With the ability to process 2 million tokens, Gemini 1.5 Pro now supports the largest input of any generative AI model available commercially. Anthropic’s Claude 3 comes in second, with a maximum of 1 million tokens. In the context of AI, "tokens" refer to segments of data, such as the syllables “fan,” “tas,” and “tic” in the word “fantastic.” To illustrate, 2 million tokens equate to approximately 1.4 million words, two hours of video, or 22 hours of audio.

In addition to handling larger files, models that support increased token inputs often exhibit improved performance. Unlike smaller models with limited context, the 2-million-token Gemini 1.5 Pro retains more recent conversation content, reducing the likelihood of distraction from the topic at hand. These large-context models can better follow data flows, leading to richer and more relevant responses.

Developers eager to try out Gemini 1.5 Pro's enhanced 2-million-token context can join a waitlist via Google AI Studio, the platform for Google’s generative AI development tools. (A version with a 1-million-token context is expected to be broadly available across Google's developer services in the coming month.)

In addition to an expanded context window, Google has announced several algorithmic enhancements that bolster the capabilities of Gemini 1.5 Pro in areas such as code generation, logical reasoning, multi-turn engagements, and understanding audio and images. Furthermore, the recent updates allow Gemini to reason with audio as well as images and videos, utilizing a feature known as system instructions to guide its processes.

For developers with less demanding needs, Google is introducing Gemini 1.5 Flash, a streamlined model specifically designed for high-frequency generative AI tasks. Available in public preview, Flash also supports a 2-million-token context window but focuses on faster, text-only output from multimodal inputs like audio, video, and images.

“While Gemini Pro is tailored for complex, multi-step reasoning tasks, Flash is ideal for situations where rapid model output is essential,” explained Josh Woodward, VP of Google Labs, during a media briefing. He added that Flash is particularly beneficial for summarizing, chat applications, captioning images and videos, and extracting data from extensive documents and tables.

Flash seems to position Google competitively against smaller, budget-friendly models like Anthropic’s Claude 3 Haiku. Both Gemini 1.5 Pro and Flash are now widely accessible in over 200 countries and territories, including the European Economic Area, the U.K., and Switzerland. However, access to the 2-million-token context version remains available through a waitlist.

In an additional move aimed at cost-conscious developers, Google’s Gemini models, not just Flash, will soon utilize a context caching feature. This will allow developers to store significant information—like knowledge bases or research paper databases—in a cache for quick and economical access.

A complementary Batch API, currently in public preview on Vertex AI, Google’s enterprise-focused generative AI development platform, will also enable a more cost-effective means to manage various workloads, including classification, sentiment analysis, data extraction, and description generation by allowing multiple prompts to be sent to Gemini models in a single request.

Another feature set to launch later this month in preview on Vertex is controlled generation, which could provide additional cost savings by allowing users to specify output formats or schemas (such as JSON or XML) for the Gemini models.

“You’ll be able to send all your files to the model at once, eliminating the need to resend them repeatedly,” Woodward noted. “This will enhance the utility of the long context while also making it more affordable.”

Google’s Enhanced Image-Generating AI: What’s New and Improved

Google Photos Unveils New AI Search Feature: Meet Ask Photos

Most people like

YouTube Summarized

Introducing YouTube Summarized, an innovative AI tool designed to create concise summaries of YouTube videos and podcasts effortlessly. With its advanced algorithms, YouTube Summarized transforms lengthy content into digestible highlights, making it easier for you to access the information you need quickly.

AI video summary generator AI Content Generator

Gigapixel AI

Elevate your visuals with our professional image upscaling tool, now available for a free trial! Experience high-quality enhancements and discover the difference today.

AI tool AI Image Enhancer

AIPRM

Summary: AIPRM is a powerful prompt management tool paired with a collaborative prompt library, designed specifically for generative AI. Join our community to enhance your AI projects with expertly crafted prompts.

prompt management Prompt

Luzia: Your intelligent assistant at a click

Introducing Your AI-Powered Assistant for Effortless Daily Task Management In a world where time is of the essence, having an AI-powered assistant can revolutionize the way you manage your daily tasks. Experience the perfect blend of efficiency and convenience, allowing you to streamline routines, enhance productivity, and focus on what truly matters. Say goodbye to overwhelm and hello to an organized and stress-free daily life!

AI assistance AI Chatbot

Find AI tools in YBX