The recent surge in AI agents began with the launch of the AutoGPT framework in March 2023. This innovative project utilizes large language models (LLMs) to facilitate the development of AI agents through automatic task decomposition and intelligent tool invocation. By integrating perception and action technologies, it applies the core capabilities of large models to real-world problems, showcasing impressive end-to-end problem-solving skills. This breakthrough has sparked global enthusiasm for AI agents, leading tech giants, startups, and investors to heavily invest in related frameworks, platforms, and applications as they strive for technological leadership.
OpenAI's release of the Assistant API and the GPT series has further heightened interest in AI agents, significantly boosting their intelligence and broadening their applications to fields like gaming and programming. While the concept of AI agents is gaining traction, definitions and applications are still evolving. Industry consensus indicates that effective agents should demonstrate environmental awareness, autonomous decision-making, and the ability to tackle complex tasks. Currently, AI agents primarily depend on large language models integrated with memory, planning, and tool capabilities, showcasing immense potential but often falling short of practical objectives. Many existing products provide only partial functions or reflect early-stage concepts of agency.
In China, notable advancements have been made in the AI agent landscape, with numerous companies and research institutions actively developing and launching innovative products. An initial industry structure is starting to emerge, along with related supply chains and ecosystems. However, the evolution of AI agents faces several challenges, including the need for enhanced intelligence, safety, controllability, and practical application facilitation.
In today’s information era, the rapid growth and adoption of large language models have intensified interest in designing and implementing agent frameworks. These frameworks aim to offer efficient and flexible solutions for developers to create feature-rich intelligent agents. While LLM capabilities continue to grow, the forms and uses of agents are still being explored. Many existing frameworks utilize the structure of "Agent = LLM + Memory + Planning + Tools," but most projects remain in the concept validation and demonstration phases, confronting issues such as incomplete documentation and inconsistent reuse.
Recent trends in agent frameworks show a shift toward multi-agent systems that overcome the limitations of single-agent models. These systems enable parallel processing of workflows and provide more reliable reasoning while handling multimodal data. The AutoGen project stands out due to its comprehensive documentation, versatility, and effective reuse. Additionally, enterprise Robotic Process Automation (RPA) is beginning to integrate agent architectures, showcasing agents' broad applicability in corporate automation. User interface agent frameworks are also gaining traction, with agents acting as interfaces to simulate human interactions, invoke applications, and complete tasks. Projects like Tencent's AppAgent and Alibaba's MobileAgent exemplify significant progress in this area.
In summary, LLM-based agent frameworks are continuously evolving and hold vast application potential. With ongoing technological advancements and market expansion, we expect the emergence of even more exceptional agent frameworks that will enhance our daily lives and work environments.
While the construction and development of AI agent platforms are still in their infancy, progress is evident in creating agents within defined workflows or standardized operating procedures. However, effective solutions for scenarios that require autonomous decision-making and workflow orchestration remain scarce. Current platforms often lack robust API ecosystems, workflow framework reuse, and comprehensive integration support.
Agent platforms generally fall into two categories: chatbots grounded in knowledge and databases, and development platforms aimed at solving complex problems and supporting multi-workflow orchestration. OpenAI's GPTs serve as examples of the former. Depending on user needs, process complexity, and coding skills, these platforms can be categorized further into no-code or low-code platforms for the public, such as Byte's Koushi platform, which is user-friendly for building basic agents; developer-focused platforms that offer model hosting and API support, like Coze's overseas version and Baidu's Lingjing Matrix; and enterprise-level development platforms that enhance intelligent workflows, such as Shizai Intelligent's TARS-RPA-Agent and Yita Technology's CubeAgent, which streamline corporate operations.
Despite the nascent state of agent platforms, we can anticipate the development of enhanced platforms as technology progresses and market demand rises, bringing innovation and convenience to our daily lives.
AI agent applications are rapidly expanding, with "GPT-like" applications offering low entry barriers and diverse use cases, representing the preliminary forms of agents. OpenAI's GPTs encompass around 3 million applications across various domains, including content creation, academic support, and game development. Popular applications span design, academic writing, multimedia generation, and more, reflected in trends on platforms like Coze. E2B companies are actively seeking AI agent project applications on GitHub, compiling a representative, though non-exhaustive, list. Programming agents comprise 45% of these applications, with general utility and daily life efficiency types each accounting for 18%. Instances of usage have emerged in data analysis, business intelligence, marketing, and research.
Many enterprise solutions involving agents are provided by RPA vendors and startups, some of which leverage LLM capabilities to manage the entire process from concept validation to product realization. Established case studies can be found in areas such as HR recruitment and automated processes using RPA agents. For example, Shizai Intelligent, a prominent player in China’s AI landscape and a leader in the RPA sector, combines domestically developed AI technology with RPA products to support government and business digital transformations. Shizai Intelligent specializes in integrating RPA agents, AGI models, and hyper-automation technology, emphasizing accessibility in the era of human-machine collaboration. Their agents allow users to automate processes easily with simple commands, effectively transforming them into digital workers.
Shizai Intelligent has reportedly aided over 2,000 central enterprises, state-owned companies, Fortune 500 firms, and government agencies in their digital transformation endeavors, deploying various "digital employees."
Industry Structure and Trend Analysis
A thorough analysis reveals that the agent industry can be structured into four layers, with the operational layer at the core, encompassing agent component manufacturers and operational integration platforms. The intelligent module relies primarily on large language models, while the memory module depends on vector database vendors, supported by plugins, tools, security, and communication protocols. The operational platform includes AI model hosting and agent framework deployment platforms, alongside emerging integration platforms and dedicated cloud services.
Key entities such as NexusGPT and Relevance AI are focusing on developing digital employee agents that can be integrated into corporate workflows. The future development of agents must address significant issues, including enriching workflows, accumulating proprietary data and knowledge, enhancing platform capabilities, and establishing sustainable business models. As technology advances and application scenarios broaden, the range of potential agent applications in gaming, coding, and standardized operational workflows will expand. We look forward to agents bringing greater convenience and innovation to our lives and work.