Imperial College London and DeepMind Unveil Embodied Agents Capable of Learning with Minimal Data

Embodied AI agents capable of interacting with the physical world possess tremendous potential for a range of applications. However, one primary obstacle remains: the scarcity of training data.

To tackle this challenge, researchers from Imperial College London and Google DeepMind have introduced the Diffusion Augmented Agents (DAAG) framework. This innovative approach harnesses the capabilities of large language models (LLMs), vision language models (VLMs), and diffusion models to boost the learning efficiency and transfer learning abilities of embodied agents.

Are You Ready for AI Agents?

Why is Data Efficiency Important for Embodied Agents?

Recent advancements in LLMs and VLMs have ignited optimism for their use in robotics and embodied AI. While these models can be trained on extensive text and image datasets collected from the internet, embodied AI systems require learning from physical interactions.

The real world poses unique challenges for data collection in embodied AI. Physical environments are considerably more complex and unpredictable than digital realms. Additionally, robots and other embodied AI rely on physical sensors and actuators, which may be slow, noisy, and susceptible to failure.

Researchers assert that overcoming these challenges lies in optimizing the use of an agent’s existing data and experiences. They state, “We hypothesize that embodied agents can achieve greater data efficiency by leveraging past experiences to explore effectively and transfer knowledge across tasks.”

What is DAAG?

The Diffusion Augmented Agent (DAAG) framework is designed to enable agents to learn tasks more effectively by utilizing past experiences and generating synthetic data. The researchers aim to help agents autonomously set and evaluate subgoals, even without external rewards, while repurposing their prior experiences to accelerate learning in new tasks.

DAAG operates within a Markov Decision Process (MDP). At the start of each episode, the agent receives task instructions, observes its environment, and takes actions to reach a state that aligns with these instructions. It features two memory buffers: a task-specific buffer for current experiences and an “offline lifelong buffer” for all past experiences, irrespective of their tasks or outcomes.

DAAG synergizes the strengths of LLMs, VLMs, and diffusion models to create agents capable of reasoning, environmental analysis, and efficient learning of new objectives by repurposing previous experiences. The LLM acts as the central controller, interpreting new task instructions, breaking them into smaller subgoals, and coordinating with the VLM and diffusion model for goal attainment.

To maximize the utility of past experiences, DAAG employs a method called Hindsight Experience Augmentation (HEA). The VLM processes visual observations in the experience buffer and compares them against desired subgoals, enhancing the agent’s memory with relevant observations. If relevant experiences are absent, the diffusion model generates synthetic data to help the agent visualize potential outcomes, allowing for exploration without direct physical interaction.

“Through HEA, we can synthetically increase the number of successful episodes stored in the agent's buffers, allowing for effective reuse of data and significantly enhancing efficiency, particularly when learning multiple tasks in succession,” the researchers explain.

They describe DAAG and HEA as an autonomous pipeline that operates independently of human supervision, leveraging geometrical and temporal consistency for generating reliable augmented observations.

What are the Benefits of DAAG?

In their evaluations across multiple benchmarks and simulated environments, researchers found that DAAG significantly outperformed traditional reinforcement learning systems in tasks like navigation and object manipulation. Notably, DAAG-enabled agents achieved goals even without explicit rewards, reached objectives faster, and required less interaction with the environment compared to non-DAAG agents.

The framework excels in reusing data from prior tasks, thereby facilitating the rapid learning of new objectives. The ability to transfer knowledge between tasks is vital for creating agents capable of continuous learning and adaptation. DAAG’s efficacy in optimizing transfer learning paves the way for more resilient and flexible robots and embodied AI systems.

“This work suggests promising avenues for addressing data scarcity in robotic learning and for developing more broadly capable agents,” the researchers conclude.

Most people like

Find AI tools in YBX