The Year of Major Model Applications: 2023 marked an unprecedented surge of interest in large AI models with the emergence of ChatGPT. Industry insiders widely predict that 2024 will usher in a wave of real-world applications for these models, signaling a critical point for AI technology in industrial use.
Interviews with several Chinese large model companies highlight key developments in the sector. In mid-May, OpenAI unveiled its flagship model, GPT-4o. Shortly after, at the World Artificial Intelligence Conference, SenseTime launched its "Riri Xin 5o," directly competing with GPT-4o. This model is noteworthy as China's first "streaming interactive" multimodal model, capable of processing diverse data types, including text, images, audio, and video. Some experts believe that multimodality has now become a standard feature among large models.
Lin Dahua, co-founder of SenseTime, shared insights during an interview, predicting a significant reduction in inference costs by the second half of the year, potentially decreasing by an order of magnitude or more. He anticipates that this will lead to innovative real-time interaction experiences. The true test of these models was demonstrated at the recent AI conference, where "Riri Xin 5o" rapidly recognized text on a staff member's badge, confirming the event's context and encouraging attendees to "learn well." When the staff casually opened a book, the model not only recognized the text but provided an easy-to-understand summary, showcasing its real-time interactive capabilities.
The "Riri Xin 5.5," as a base model, offers numerous upgrades compared to its predecessor "Riri Xin 5.0," which was the first Chinese large model to benchmark GPT-4 Turbo. The new system boasts an average performance enhancement of 30%, with interaction capabilities aligned with GPT-4o.
Despite the advancements, Lin Dahua acknowledged challenges in reasoning abilities, even among leading models like ChatGPT. He noted that while current models are impressive in terms of knowledge, inference remains a significant hurdle due to inherent inaccuracies and hallucinations that can lead to errors during complex reasoning processes. Improving large models' abilities requires developing scalable cognitive chaining techniques that enhance their foundational reasoning skills.
Industry thought leaders emphasize the necessity of delivering "real value" for large model applications. While interest in AI is high, it has not yet reached a "super moment" in vertical industry applications, according to Xu Li, CEO of SenseTime. Achieving seamless interaction is a critical breakthrough point. Lin Dahua stressed the importance of providing genuine value to users, such as emotional support or tools that boost efficiency.
Looking ahead, large models are poised to evolve into comprehensive assistants, integrating various user needs, such as document management, scheduling, and knowledge retrieval. This holistic approach will enhance user experience and provide significant value. As the integration of information improves, these user assistant tools could unify into a singular application or model, ultimately delivering better services and problem-solving capabilities.
Additionally, the concept of "general-special integration" is gaining traction, combining general AI models with specialized capabilities to create systems that balance versatility and expertise. Lin Dahua supports this direction, highlighting the need for differentiated capabilities in specialized fields to enhance user value and competitiveness.
In summary, as large models continue to develop, their potential for transformative applications across industries appears boundless. The future of AI lies in creating cohesive systems that blend various functionalities into a single, efficient solution for users and organizations alike.