In the era of artificial intelligence (AI), the emphasis is shifting from monitoring daily active users (DAU) to developing "super-capable" applications, according to industry leaders at the World Artificial Intelligence Conference held on July 4, 2024. There was a unanimous agreement on the necessity of identifying relevant application scenarios and avoiding the pitfalls of a traffic-driven mindset.
Li Yanhong, CEO of Baidu, highlighted that applications based on foundational models are progressively infiltrating various sectors. Recently, the daily invocation of the Wenxin large model surged from 200 million to 500 million, marking a phenomenal increase within just two months. This drastic rise reflects genuine demand, with companies genuinely benefiting from the capabilities of large models. For instance, in the logistics sector, the integration of large model capabilities has streamlined order processing to the extent that a simple visualization and verbal direction suffice, reducing processing time from over three minutes to just 19 seconds. Remarkably, over 90% of post-sale issues can now be addressed by large models, showcasing a significant boost in efficiency.
As for the future of AI applications, Li is particularly optimistic about intelligent agents. He believes that as foundational models become more robust, developing applications will become increasingly straightforward. The most straightforward option is creating intelligent agents — a process that merely requires articulating workflows in everyday language and complementing them with a specialized knowledge base. This makes building intelligent agents much simpler than creating a web page in the internet era.
Jing Xiandong, CEO of Ant Group, agrees, asserting that specialized intelligent agents can tackle critical challenges faced by general large models in more rigorous industrial applications. The industry acknowledges three key limitations of general large models in strict industrial contexts: insufficient domain knowledge, difficulty in handling complex decision-making, and the fact that conversational interaction does not equate to effective collaboration. Jing emphasizes that future intelligent user experiences will not rely solely on one large model; instead, they will necessitate deep collaboration across entire industries with numerous specialized agents participating in their respective roles.
However, the commercialization of large models faces considerable hurdles. Li warns against falling into the "traffic trap" that many Chinese companies have encountered, wherein the focus shifts towards achieving monumental metrics like a billion DAUs mistakenly perceived as success. This mentality is more suited to the mobile era. Furthermore, challenges in the commercial model of large-scale AI implementation also present difficulties for enterprises. For instance, Fu Sheng, CEO of Cheetah Mobile, noted that large enterprises often grapple with obstacles such as feeding internal data and debugging interfaces. Wang Jian, an academician at the Chinese Academy of Engineering and founder of Alibaba Cloud, echoed these sentiments, asserting that a significant factor often overlooked in AI adoption is people. He points out that systemic issues make it challenging for large organizations to fully embrace AI. While large companies view AI as a revolutionary tool, smaller companies see it as a transformative force, and if large firms can adopt this mindset, AI could achieve groundbreaking impacts.
Despite the excitement generated by applications like ChatGPT and platforms bringing video capabilities, industry leaders like Xu Li, Chairman and CEO of SenseTime, argue we have not yet reached the "super moment" of AI. There's a pressing need for AI to penetrate vertical applications to initiate widespread change. He believes that for AI to experience its "iPhone moment," it must overcome three core elements: intelligence, user experience, and controllability. Xu explains that for large models to develop profound cognitive capabilities, they need access to advanced data linked to practical applications, which in turn fosters high-quality data generation. Moreover, smooth real-time interaction depends on optimizing computing resources across both frontend and cloud systems to create a natural interaction model. Ultimately, controllability remains a crucial aspect; without setting boundaries on content—whether text, images, or videos—large models cannot achieve sustainable and manageable AI development.