Meta Launches Its First Open AI Model Capable of Image Processing

Just two months after launching its latest AI model, Meta has unveiled a significant update: its first open-source model that can process both images and text. The new Llama 3.2 model empowers developers to build advanced AI applications, such as augmented reality apps that provide real-time video analysis, visual search engines that categorize images by content, and document analysis tools that summarize lengthy texts.

Meta emphasizes that integrating Llama 3.2 will be straightforward for developers. As Ahmad Al-Dahle, Meta's vice president of generative AI, noted, developers simply need to incorporate this “new multimodality” to allow Llama to interact with images.

With competitors like OpenAI and Google already releasing multimodal models, Meta is catching up in this arena. The addition of vision support is crucial as Meta expands its AI capabilities, particularly on devices like its Ray-Ban Meta glasses.

Llama 3.2 features two vision models (11 billion and 90 billion parameters) alongside two lightweight text-only models (1 billion and 3 billion parameters). The smaller models are optimized for use on Qualcomm, MediaTek, and other Arm hardware, indicating Meta’s strategy to enhance mobile applications.

There’s still a role for the previous Llama 3.1 model, released in July, which includes a version with 405 billion parameters. This model remains superior for text generation tasks.

Most people like

Find AI tools in YBX

Related Articles
Refresh Articles