Researchers at the University of California, Berkeley, have developed an innovative control system for humanoid robots, enabling them to skillfully navigate diverse terrains and obstacles. This AI-driven system draws inspiration from the deep learning frameworks that have transformed large language models (LLM). Its core principle is straightforward: by analyzing recent observations, the AI can predict future states and actions.
Trained entirely in simulation, the system exhibits strong performance in unpredictable real-world conditions. By evaluating past interactions, it dynamically adjusts its behavior to effectively manage new scenarios that it hasn't encountered during its training.
A Robot for All Terrains
Humanoid robots, designed to resemble humans, have the potential to become invaluable assistants capable of performing various physical and cognitive tasks. However, creating versatile humanoid robots poses significant challenges, particularly in developing a flexible control system.
Traditional robotic control systems often lack adaptability, designed for specific tasks and struggling to handle the unpredictability of real-world terrains and visual conditions. This rigidity confines their utility to controlled environments.
Consequently, there is an increasing focus on learning-based methods for robotic control. Such systems can adjust their behavior based on data collected from simulations or direct environmental interactions.
The control system from U.C. Berkeley promises to adeptly guide humanoid robots through a range of scenarios. Deployed on Digit, a full-sized, general-purpose humanoid robot, this system demonstrates exceptional outdoor walking capabilities, reliably navigating everyday human environments like walkways, sidewalks, tracks, and open fields. The robot skillfully traverses various surfaces—including concrete, rubber, and grass—without falling.
The researchers report, “We found that our controller was able to walk over all of the tested terrains reliably and were comfortable deploying it without a safety gantry. Over the course of a week of testing in outdoor environments, we did not observe any falls.”
Moreover, the robot has been rigorously tested for resilience against disturbances. It effectively manages unexpected steps, random objects in its path, and even projectiles, maintaining stability when pushed or pulled.
Robot Control with Transformers
While various humanoid robots showcase impressive abilities, this new system stands out for its training and deployment methodology.
The AI control model was trained purely in simulation, utilizing thousands of domains and tens of billions of scenarios within Isaac Gym, a high-performance physics simulation environment. This extensive simulated experience seamlessly transfers to real-world applications without additional fine-tuning, a process known as sim-to-real transfer. Notably, the system exhibited emergent abilities in real-world scenarios, such as navigating steps that weren't explicitly covered during training.
Central to this system is a "causal transformer," a deep learning model that processes historical proprioceptive observations and actions. This transformer effectively identifies the relevance of specific information—like gait patterns and contact states—pertinent to the robot's observations.
Transformers, known for their success in large language models, are particularly skilled at predicting subsequent elements in extensive data sequences. The causal transformer utilized in this robot learns from sequences of observations and actions, allowing it to anticipate the consequences of its behavior, adapting dynamically to varying landscapes, even unfamiliar ones.
The researchers state, “We hypothesize that the history of observations and actions implicitly encodes the information about the world that a powerful transformer model can use to adapt its behavior dynamically at test time.” This concept, termed “in-context adaptation,” parallels how language models utilize contextual information to learn new tasks and refine outputs during inference.
Transformers have proven to be superior to other sequential models, such as temporal convolutional networks (TCN) and long short-term memory networks (LSTM). Their architecture supports scalability with additional data and computational resources, and they can be enhanced by integrating various input modalities.
In the past year, transformers have emerged as valuable tools within the robotics community, with multiple models leveraging their versatility to enhance robotic capabilities. They offer substantial benefits, including improved encoding of different modalities and translating high-level natural language instructions into specific planning steps for robots.
The researchers conclude, “Analogous to fields like vision and language, we believe that transformers may facilitate our future progress in scaling learning approaches for real-world humanoid locomotion.”