Meta Launches Its First Open AI Model Capable of Image Processing

Just two months after its last major AI model release, Meta has unveiled a significant update: Llama 3.2, its first open-source model that processes both images and text. This new model empowers developers to create more sophisticated AI applications, such as augmented reality experiences that offer real-time video analysis, content-based visual search engines, and document summarization tools.

Meta assures that integrating this model will be straightforward for developers, requiring minimal effort to incorporate its multimodal capabilities. “Developers can easily showcase Llama's ability to process images and communicate,” said Ahmad Al-Dahle, Vice President of Generative AI at Meta.

While other AI companies like OpenAI and Google launched multimodal models last year, Meta is now enhancing its AI capabilities, particularly in conjunction with hardware like its Ray-Ban Meta glasses.

Llama 3.2 features two vision models, equipped with 11 billion and 90 billion parameters, as well as two lightweight text-only models with 1 billion and 3 billion parameters. The smaller models are optimized for Qualcomm, MediaTek, and other Arm hardware, aiming for practical applications on mobile devices.

The earlier version, Llama 3.1, released in July, remains relevant as it includes a more robust model with 405 billion parameters, expected to excel in text generation tasks.

Most people like

Find AI tools in YBX

Related Articles
Refresh Articles