Qualcomm Aims to Enhance Your Android Phone with Innovative AI Tools

At Mobile World Congress 2024, Qualcomm is expanding its portfolio of AI capabilities powered by the Snapdragon series for Android devices. The company has already unveiled remarkable AI features for the Snapdragon 8 Gen 3 flagship, including voice-activated media editing, on-device image generation with Stable Diffusion, and an advanced virtual assistant leveraging large language models from Meta.

Today, Qualcomm introduced enhancements to these AI functionalities. A key feature is the new Large Language and Vision Assistant (LLaVa) designed specifically for smartphones. This innovative tool functions like a chatbot, akin to ChatGPT, but integrates Google Lens capabilities. As a result, Qualcomm's solution can process both text inputs and images.

For instance, you can upload a photo of a charcuterie board and inquire about its contents. The AI assistant, built on a large multimodal model (LMM) that processes over 7 billion parameters, will identify various fruits, cheeses, meats, and nuts present in the image. It can also handle follow-up questions, enabling a natural conversational flow. While ChatGPT and similar products have also added multimodal capabilities, they rely on cloud-based architecture, which involves remote servers for data processing. Qualcomm's approach focuses on on-device processing, ensuring faster responses and enhanced privacy, with minimal risk of data intrusion. Qualcomm emphasizes, “This LMM runs at a responsive token rate on device, leading to increased privacy, reliability, personalization, and cost efficiency.” The specifics regarding whether Qualcomm's LLaVa-based virtual assistant will launch as a standalone app or carry a fee remain unconfirmed.

The next significant announcement from Qualcomm delves into the creative aspects of image generation and manipulation. Recently, Qualcomm demonstrated the world’s fastest text-to-image generation on a smartphone utilizing Stable Diffusion technology. Today, the company provided a preview of the LoRA-driven image generation.

LoRA, or Low-Rank Adaptation, is a novel technique developed by Microsoft that offers a different approach to image generation compared to traditional generative AI tools like DALL·E. Training AI models can be costly, slow, and hardware-intensive. LoRA addresses these challenges by significantly reducing model weight, concentrating on specific segments, and limiting the number of parameters during the training process. This results in lower memory requirements and faster operations, dramatically streamlining the adaptation of text-to-image models.

The LoRA distillation method has been effectively incorporated into the Stable Diffusion model for generating images from text prompts. Thanks to its efficiency and ease of adaptability, LoRA is considered well-suited for smartphones. Qualcomm believes in its potential, and even competitor MediaTek is adopting this approach for generative AI features on its flagship Dimensity 9300 chip.

At MWC 2024, Qualcomm is also showcasing a variety of AI features, some of which are already available on the Samsung Galaxy S24 Ultra. Among these capabilities are the ability to expand an image’s canvas using generative AI filling and AI-powered video generation, which is particularly ambitious. It will be intriguing to see how Qualcomm successfully implements these advanced technologies on smartphones.

Most people like

Find AI tools in YBX