Meta’s Llama AI Models Expand to Image Support: A New Era of Multimodal Intelligence

Benjamin Franklin famously stated that nothing is certain except death and taxes. I propose we update that saying for our era of rapid technological evolution: Nothing is certain except death, taxes, and the constant emergence of innovative AI models. These models are being introduced at an astonishing rate.

This week, Google launched enhanced Gemini models, and earlier this month, OpenAI revealed its o1 model. On Wednesday, it was Meta's turn to showcase its latest advancements at the annual Meta Connect 2024 developer conference held in Menlo Park.

Llama Goes Multimodal

Meta's Llama model family has officially advanced to version 3.2, highlighting that multiple Llama variants are now multimodal. Specifically, Llama 3.2 11B—a compact model—and Llama 3.2 90B, a larger and more capable version, can interpret charts and graphs, generate captions for images, and identify objects within pictures based on simple descriptions.

For instance, using a park map, Llama 3.2 models can respond to inquiries such as, "When does the terrain become steeper?" and "What is the distance of this path?" If provided with a graph detailing a company's yearly revenue, the models can easily identify the most successful months.

For developers focused solely on text applications, Meta states that Llama 3.2 11B and 90B serve as "drop-in" replacements for version 3.1. These models can operate with or without the new safety tool, Llama Guard Vision, which detects potentially harmful (biased or toxic) text and images processed through the models.

Across various cloud platforms—including Hugging Face, Microsoft Azure, Google Cloud, and AWS—users worldwide can download and utilize the multimodal Llama models. Meta is also hosting them on the official site, Llama.com, while employing them to enhance its AI assistant, Meta AI, across WhatsApp, Instagram, and Facebook.

Image Access Restrictions in Europe

However, Llama 3.2 11B and 90B are not available in Europe. As a consequence, several AI features, such as image analysis, are disabled for users in the European region. Meta cited the "unpredictable" nature of the EU's regulatory framework as the primary obstacle.

Notably, Meta has expressed apprehensions regarding the AI Act, a European law that outlines legal and regulatory requirements for AI technologies. This law mandates that companies developing AI in the EU assess whether their models are likely to be utilized in "high-risk" contexts, such as law enforcement. Meta is concerned that the open nature of its models—providing little insight into their usage—could create compliance challenges.

Additionally, the General Data Protection Regulation (GDPR) poses a hurdle for Meta. The company currently trains its models on publicly available data from Instagram and Facebook users who haven’t opted out, although this data falls under GDPR protections in Europe. Earlier this year, EU regulators requested that Meta halt its training on European user data until compliance assessments were completed.

To comply, Meta agreed but also signed an open letter advocating for a "modern interpretation" of GDPR that embraces technological progress. Recently, Meta announced it would resume training on U.K. user data after revising its opt-out process to integrate regulatory feedback, but further updates on European data training remain pending.

New Lightweight Models

Recently, Meta introduced other Llama models—those not trained on European user data—that launched on Wednesday, available both in Europe and globally.

Llama 3.2 1B and 3B, two sleek, text-only models, are tailored for smartphones and other edge devices. They can facilitate tasks such as summarizing and rewriting text, including content in emails. Designed to optimize performance on Arm hardware from Qualcomm and MediaTek, these models can even interact with applications like calendar tools, enabling them to perform actions autonomously with proper configuration.

A follow-up to the flagship Llama 3.1 405B model, released in August, has not yet been introduced, likely due to resource constraints stemming from the former's extensive training duration. We've reached out to Meta for comment regarding any additional factors influencing this decision.

To support developers, Meta has unveiled the Llama Stack—a suite of development tools for customizing all Llama 3.2 models: 1B, 3B, 11B, and 90B. Regardless of their customizations, the models can efficiently process up to approximately 100,000 words in a single instance.

A Strategic Play for Dominance

Meta CEO Mark Zuckerberg emphasizes his commitment to ensuring that everyone benefits from the opportunities presented by AI. This ambition, however, indeed includes a vested interest in positioning Meta as a leading provider within the AI landscape.

Investing in proprietary models forces competitors—like OpenAI and Anthropic—to reduce their pricing, broadening the spread of Meta's AI offerings and enabling the company to incorporate enhancements from the open-source community. Meta claims its Llama models have surpassed 350 million downloads and are actively utilized by major enterprises like Zoom, AT&T, and Goldman Sachs.

For many developers and organizations, the less-than-open licensing of the Llama models is a minor concern. Meta’s licensing agreements impose usage restrictions for specific developers; platforms boasting over 700 million monthly users must obtain a special license from Meta, which is granted at the company’s discretion.

While it is relatively rare to find platforms of that size without their own proprietary models, Meta's licensing approval process lacks transparency. When inquiring about any approved Llama licenses this month, a Meta spokesperson indicated that no further information was available.

In summary, Meta is strategically positioning itself in the competitive AI landscape. The company is investing millions in lobbying to shape regulatory perspectives, while also committing substantial funds to develop servers, data centers, and network infrastructure to advance future AI initiatives.

While the new Llama 3.2 models don’t address significant concerns surrounding current AI technologies—such as their propensity to generate misleading information or reflect problematic training data—they do propel Meta closer to its objective of becoming synonymous with AI, particularly in the realm of generative applications.

Most people like

Find AI tools in YBX

Related Articles
Refresh Articles