Meet Alter3: The Cutting-Edge Humanoid Robot Powered by GPT-4

Home AI News Meet Alter3: The Cutting-Edge Humanoid Robot Powered by GPT-4

Researchers at the University of Tokyo and Alternative Machine have developed a humanoid robot system named Alter3, capable of translating natural language commands directly into robotic actions. Leveraging the extensive knowledge embedded in large language models (LLMs) like GPT-4, Alter3 can perform complex tasks such as taking selfies or simulating being a ghost.

This innovation marks a significant advancement in integrating foundational models with robotic systems. While a scalable commercial solution remains on the horizon, recent progress has energized robotics research and held considerable promise.

Transforming Language into Robot Actions

Alter3 utilizes GPT-4 as its core model, processing natural language instructions that describe actions or scenarios for the robot to respond to. The model employs an "agentic framework" to devise a series of action steps required to achieve the specified goal. Initially, it acts as a planner, determining the sequence necessary for the desired task.

Alter3 employs various GPT-4 prompt formats to analyze instructions and map them to robot commands. Since GPT-4 lacks specific training on Alter3's programming commands, researchers exploit its in-context learning to adapt its output to the robot's API. This involves providing a list of commands and illustrative examples on their usage, allowing the model to translate each action step into executable API commands for the robot.

“Previously, we manually controlled all 43 axes in a specific order to replicate human poses or simulate actions like serving tea or playing chess,” the researchers note. “With LLMs, we are liberated from this labor-intensive process.”

Incorporating Human Feedback

Given that language can be imprecise for detailing physical movements, the action sequences generated by the model may not always yield the intended robotic behavior. To address this, researchers have integrated a feedback mechanism enabling users to refine commands, such as “Raise your arm a bit more.” These corrections are processed by another GPT-4 agent, which adjusts the code and returns the revised action sequence for robot execution. The enhanced plans and codes are then stored for future application.

The incorporation of human feedback and memory significantly boosts Alter3's performance. Researchers have evaluated the robot across various tasks, from simple actions like taking selfies and sipping tea to more complex imitations such as acting like a ghost or a snake. The model has also demonstrated its ability to manage scenarios that necessitate intricate planning.

“The training of the LLM encompasses diverse linguistic representations of movements. GPT-4 accurately translates these into commands for Alter3,” the team explains.

With GPT-4's vast understanding of human behavior, it can effectively generate realistic behavior plans for humanoid robots. In experiments, the team also managed to imbue Alter3 with emotional expressions such as embarrassment and joy.

“Even from texts that don’t explicitly mention emotional cues, the LLM can deduce appropriate emotions, reflecting them in Alter3’s physical responses,” the researchers highlight.

Advancements in Robotics Models

The adoption of foundation models in robotics research is rapidly gaining traction. For instance, Figure, valued at $2.6 billion, employs OpenAI models to interpret human commands and execute corresponding real-world actions. With the rise of multi-modal capabilities in foundational models, robotics systems are poised to enhance their environmental reasoning and decision-making.

Alter3 exemplifies a trend where off-the-shelf foundation models serve as reasoning and planning modules within robotic control systems. Importantly, it does not rely on a fine-tuned version of GPT-4, allowing its code to be applicable to other humanoid robots.

Projects such as RT-2-X and OpenVLA utilize specialized foundational models designed to produce robotics commands directly. While these models often yield more stable outcomes and generalize across diverse tasks and environments, they necessitate higher technical expertise and development costs.

Nonetheless, one critical aspect often overlooked in these initiatives is the foundational challenge of enabling robots to perform basic tasks, including grasping objects, maintaining balance, and navigating environments. "A significant amount of work occurs at a level below what these models address," remarked AI and robotics scientist Chris Paxton in a recent interview. "That’s some of the challenging work, largely due to the lack of existing data."

How Adversarial AI is Eroding Trust in a Deepfake World

Roblox Unveils New Insights into Its Development of 4D Generative AI Technology

Most people like

insMind AI Image Generator

1.2M

Introducing a free online AI image generator that transforms text into eye-catching visuals in just 2 seconds! Create breathtaking images effortlessly with our advanced technology.

AI image generator Text to Image

Twiser

22K

Twiser seamlessly combines OKR (Objectives and Key Results), LMS (Learning Management System), and success planning to boost productivity and drive exceptional results. Discover how Twiser empowers teams to excel and reach their full potential.

Talent Management AI Product Description Generator

NsfwGPT.AI

256.8K

In today's rapidly evolving digital landscape, the intersection of artificial intelligence (AI) and immersive experiences is captivating a growing audience. As technology advances, the AI community is harnessing innovative tools to transform how we interact with our environment, enhancing both entertainment and learning. Exploring this dynamic fusion not only highlights the potential of AI but also showcases its role in crafting engaging, immersive experiences that resonate with users. Join us as we delve into the ways AI is revolutionizing our understanding and engagement with the world around us.

AI technology NSFW

Moontower

35.1K

Introducing the ultimate volatility monitor for options traders: a comprehensive cross-asset analysis tool designed to enhance your trading strategy and decision-making. Stay informed with real-time insights into market fluctuations across various asset classes, enabling you to seize opportunities and mitigate risks effectively.

volatility AI Analytics Assistant

Find AI tools in YBX