Hugging Face Unveils Idefics2: The New 8B Open-Source Visual Language Model

Home AI News Hugging Face Unveils Idefics2: The New 8B Open-Source Visual Language Model

Updated on October 28 2024

Hugging Face launched its Idefics visual language model in 2023, leveraging technology initially developed by DeepMind. The upgraded version, Idefics2, is now available on Hugging Face and features a smaller parameter size, an open license, and enhanced Optical Character Recognition (OCR) capabilities.

Idefics, which stands for Image-aware Decoder Enhanced à la Flamingo with Interleaved Cross-attentions, is a versatile multimodal model capable of processing both text and image prompts. While the original Idefics boasted 80 billion parameters, Idefics2 has been streamlined to just 8 billion, making it comparable to models like DeepSeek-VL and LLaVA-NeXT-Mistral-7B.

Key improvements in Idefics2 include advanced image manipulation, supporting native resolutions of up to 980 x 980 pixels without the need for resizing to fit a fixed-size square ratio, a common limitation in traditional computer vision.

The model's OCR capabilities have also seen enhancements through the incorporation of data derived from the transcription of text in images and documents. The Hugging Face team has improved Idefics2’s ability to respond to questions related to charts, figures, and documents.

Moreover, the architecture of Idefics2 has been simplified by moving away from the gated cross-attention mechanisms used in its predecessor. According to Hugging Face, “The images are fed into the vision encoder, followed by learned Perceiver pooling and a Multilayer Perceptron modality projection. This pooled sequence is concatenated with the text embeddings to create an interleaved sequence of images and text.”

To train Idefics2, Hugging Face utilized a combination of publicly available datasets, including Mistral-7B-v0.1 and siglip-so400m-patch14-384. Additional training data included web documents, image-caption pairs, OCR data, and image-to-code resources.

The release of Idefics2 comes amid a surge of multimodal models in the AI landscape, including Reka’s Core model, xAI’s Grok-1.5V, and Google’s Imagen 2.

"How MongoDB's Collaborations with AI Startups and Cloud Giants like AWS, Google, and Microsoft are Driving Developer Generative AI Innovation"

Telesign’s Verify API Leverages AI and ML for Enhanced Security and Omnichannel Growth

Most people like

Upscayl - Free AI Image Upscaler

Unlock the power of a free AI tool designed specifically for upscaling images. Whether you're a photographer, designer, or just someone looking to enhance your visuals, this innovative technology makes it easy to elevate your image quality without compromising detail. Experience the transformative capabilities of AI-driven image upscaling today!

image upscaling AI Image Enhancer

MusicHero.ai: Free AI Music Generator from Text Online

Discover the innovative world of AI music generation where you can transform your written words into captivating melodies. Our free AI music generator allows you to easily convert text into original musical compositions, empowering artists, hobbyists, and anyone passionate about music to explore new creative horizons. Dive into the seamless experience of crafting unique soundtracks that resonate with your ideas and emotions—all at no cost!

AI music generator AI Music Generator

AIVA - The AI composing emotional soundtrack music

AIVA, your AI-powered music composer, crafts personalized soundtracks tailored to your unique preferences. Whether you need inspiration for a project or a custom piece for a special occasion, AIVA brings your musical vision to life.

AI composer AI Music Generator

WordAi

WordAi is an advanced AI-driven text rewriter designed to enhance your content through effective sentence restructuring and text enrichment. By transforming your writing, WordAi helps you create more engaging and polished material, ensuring it resonates with your audience while also optimizing for search engines.

AI text rewriter AI Rewriter

Find AI tools in YBX