Sure, AI can craft sonnets and even perform a quirky Nirvana cover as Homer Simpson. However, to truly embrace our future with intelligent machines, we need systems that offer practical capabilities. That's precisely why Meta and Nvidia are equipping their AI models to tackle real-world tasks by training them in simulated environments.
Today, both tech giants unveiled research focused on enhancing AI interactions within the real world through advanced simulations.
The challenge of the real world is its complexity and unpredictability, which can slow down learning processes. For instance, teaching AI to perform tasks like opening a drawer and placing an object inside typically requires countless repetitions, potentially taking days. However, with simulation, these AI agents can learn nearly as effectively in just a couple of minutes.
Simulators aren't a new concept, but Nvidia has taken a significant step forward by integrating a large language model to refine the reinforcement learning code. This innovative approach is called the Evolution-driven Universal REward Kit for Agents, or EUREKA. While the name might be a stretch, its purpose is clear.
Imagine teaching an AI to sort objects by color. There are various ways to define this task, yet some methods are more effective. For example, should the AI prioritize minimizing movements or speeding up task completion? Humans excel at coding these tasks, but identifying the most efficient method often involves trial and error. Interestingly, Nvidia's research revealed that a code-trained language model significantly outperformed humans in optimizing reward functions. This system even self-improves, iterating on its code to adapt to different applications.
The compelling pen trick you see is simulated, but it required significantly less human input and expertise than traditional methods without EUREKA. Agents using this technique excelled in various virtual dexterity and locomotion challenges, exhibiting capabilities like operating scissors impressively—definitely a plus!
Translating these learned behaviors to the real world remains a unique challenge, as it involves “embodying” AI. Nevertheless, this underscores Nvidia's genuine commitment to integrating generative AI into practical applications.
New Developments in Embodied AI: Meta’s Habitat 3.0
Meta is also making strides in the realm of embodied AI, announcing several advancements including a new version of its “Habitat” dataset. Originally released in 2019, Habitat offered nearly photorealistic, meticulously annotated 3D environments for AI agents to explore. While simulated environments are not new, Meta aims to simplify the process of creating and utilizing them.
The release of Habitat 2.0 included more interactive and physically realistic environments, along with a library of objects to enhance these settings—something many AI firms are increasingly valuing.
Now, with Habitat 3.0, we see the addition of human avatars interacting alongside AI in virtual spaces via VR. This capability is critical; for instance, if a robot is tasked with tidying up a living room—like transporting dishes to the kitchen and collecting stray clothes—it might devise a strategy that could be disrupted by human activity. However, with the presence of a human or human-like agent sharing the space, it can practice the task numerous times in seconds, learning to adapt to its surroundings.
This training approach, termed “social rearrangement,” is paired with another task called “social navigation.” Here, the robot learns to unobtrusively follow a person, such as a small bot accompanying someone in a hospital for safety.
Meta also introduced a new high-fidelity database of 3D interiors called HSSD-200. Their findings suggest that training in about a hundred of these detailed environments yields better results than using thousands of lower-quality scenes.
Furthermore, Meta discussed a new robotics simulation stack, HomeRobot, designed for Boston Dynamics’ Spot and Hello Robot’s Stretch. By establishing a standardized framework for basic navigation and manipulation tasks, they hope to empower researchers to focus on higher-level innovations.
Habitat and HomeRobot are now available under an MIT license on GitHub, while HSSD-200 is provided under a Creative Commons non-commercial license—making these resources accessible for researchers eager to explore.
Keywords: AI, machine learning, Meta, Nvidia, robotics