Zhang Bo Addresses Key Issues in the AI Industry: The Lack of Theoretical Foundations Behind Models and Algorithms

On August 1, Zhang Bo, an academician of the Chinese Academy of Sciences and Honorary Dean of the Tsinghua University Artificial Intelligence Research Institute, delivered a keynote speech at the 12th Internet Security Conference (ISC.AI 2024). He emphasized that current artificial intelligence (AI) lacks a foundational theory, operating instead on models and algorithms designed for specific fields. This limitation has resulted in a fragmented market and an underdeveloped AI industry.

At 89 years old, Zhang Bo has spent decades mentoring AI talent at Tsinghua University and is considered one of the pioneers of AI in China. Many prominent "Tsinghua-based" AI companies, such as Shenshu Technology, Zhipu AI, Mianbi Intelligence, and Kimi, have benefited from the technical groundwork laid during his tenure, with key personnel either directly or indirectly trained by him.

During his speech, Zhang identified the current shortcomings of AI technology while also proposing directions for future improvement. He highlighted the development of basic models, suggesting that it is essential to consider three key capabilities and one significant limitation. Due to theoretical constraints, the initial phase of AI development is expected to merge with specific application fields, resulting in what is known as "narrow" AI.

He stated that existing foundational models have achieved generality in language processing, emphasizing three main strengths: powerful language generation, effective human-computer interaction, and a notable capacity for drawing inferences. Zhang remarked on the impressive versatility of large language models (LLMs) in generating diverse, comprehensible outputs. He pointed out that while achieving seamless natural dialogue between humans and machines seemed a distant goal, it was unexpectedly realized by 2020.

However, Zhang also acknowledged a critical flaw in these models, referred to as "hallucination." This occurs because the diversity in output inevitably leads to errors that are less controllable. Unlike predictable machine errors, these mistakes originate from the model's nature, presenting challenges for future applications.

In discussing the suitable application scenarios for large models, he noted that tolerance for errors is crucial. The application curve for large models resembles a "U" shape: early stages focus on diverse planning and design, while later stages require a similar variety but with a higher tolerance for mistakes. Despite existing problems, Zhang advocated for the indispensable role of models, asserting that their foundation dramatically improves efficiency and quality.

Zhang analyzed the root cause of hallucinations, explaining that current models primarily operate under human direction rather than autonomously. Their output is heavily influenced by user prompts, highlighting the contrast with human cognitive processes.

Looking ahead, Zhang outlined four key development directions for future large models: alignment with human values, multimodal generation, the integration of AI agents with their environments, and embodied intelligence through robotics. He stressed the importance of aligning AI models with human judgment, fostering multimodal capabilities to generate not only text but also images, sounds, videos, and code.

Regarding AI agents, he emphasized their need for interaction with virtual environments, which can provide feedback to correct mistakes. Finally, he discussed embodied intelligence, suggesting a vision for versatile robotics that extends beyond human-like forms.

Zhang concluded that to advance the third generation of AI, a solid theoretical framework must be established. The absence of such a theory leads to confusion and misunderstanding, especially as AI scales. Until we develop safe, controllable, trustworthy, and scalable AI technologies, the safety concerns surrounding artificial intelligence will remain an ongoing issue.

Most people like

Find AI tools in YBX