The competition among large AI models is intensifying, generating numerous debates within the industry. On May 16, reports surfaced regarding a Huawei Ascend event, where the code "Time.sleep(6)" was displayed while showcasing the model's image generation capabilities. This prompted industry experts to speculate that the image-text outcomes may not have been produced by the model itself, but rather artificially manipulated, leading to claims that the presentation was a failure.
In response, the Ascend community clarified that at the Kunpeng Ascend Developer Conference on May 10, they demonstrated the mxRAG SDK’s real-time image generation, employing an open-source large model. The "Time.sleep(6)" command was simply a wait for the external model to produce images, rather than retrieving pre-existing images. They reassured attendees, "All displayed code is authentic and will be available for developers on the Ascend community platform for feedback."
On May 10, the "Ascend AI Developer Summit," themed "Together We Rise, Pursuing Future Dreams," took place, featuring a keynote by Zhang Dihuan, President of Huawei’s Ascend Computing Business. As a subsidiary of Huawei, Ascend specializes in AI computing chips.
The Ascend ecosystem encompasses a complete AI computing infrastructure, based on the Ascend series processors and supporting software. This includes various processors, hardware, the Compute Architecture for Neural Networks (CANN), AI computation frameworks, application enablement tools, development toolchains, and industry-specific applications—forming an extensive supply chain.
In the midst of fierce competition, Chinese firms are continuously rolling out new large models. On May 15, ByteDance introduced "Doubao," a self-developed large language model capable of processing 120 billion text tokens daily and generating 30 million images.
Additionally, on May 9, Alibaba Cloud launched Tongyi Qianwen 2.5, which has shown improvements in understanding, logical reasoning, command adherence, and coding capabilities—by 9%, 16%, 19%, and 10%, respectively. When evaluated in Chinese contexts, this model outperformed GPT-4 in various capacities, including text comprehension, generation, knowledge assessment, and casual conversation.
Globally, on May 15, Google announced a series of product updates during its 2024 I/O Developer Conference. Notable releases included the lightweight Gemini 1.5 Flash, the AI generalist Project Astra, enhanced AI search features, a new video generation model called Veo, and the sixth-generation TPU Trillium, which upgrades Gemini 1.5 Pro's context window from one million to two million tokens.
On May 14, OpenAI debuted GPT-4o, a faster and more affordable multimodal large model than GPT-4 Turbo, featuring improved user interaction. OpenAI claims GPT-4o can recognize user emotions and engage in conversations at a pace that closely mirrors human dialogue, with response times averaging 320 milliseconds.
According to a research report from Huatai Securities, the commercial landscape for AI large models could trend toward a winner-takes-all or oligopoly scenario. The development path is clear—from foundational large models to industry-specific applications—highlighting significant costs and technical barriers that may confine major players to established tech giants.