DeepMind's GenEM Harnesses LLMs to Create Expressive Robot Behaviors

Home AI News DeepMind's GenEM Harnesses LLMs to Create Expressive Robot Behaviors

Updated on February 5 2024

Humans frequently use expressive behaviors to convey goals and intentions. For example, we nod to greet a coworker, shake our heads to indicate disapproval, or say "excuse me" to navigate through a crowd. To facilitate smoother interactions with humans, mobile robots must also exhibit similar expressive behaviors. However, this challenge remains significant in robotics, and existing solutions often lack flexibility and adaptability.

In a groundbreaking study, researchers from the University of Toronto, Google DeepMind, and Hoku Labs introduce GenEM, a novel approach harnessing the extensive social context embedded in large language models (LLMs) to enable robots to perform expressive behaviors. By utilizing various prompting methods, GenEM allows robots to interpret their environment and replicate human-like expressions effectively.

Expressive Behaviors in Robotics

Traditionally, creating expressive robot behavior relied on rule- or template-based systems, which demand considerable manual input for each robot and environment. This rigidity means that any changes or adaptations necessitate extensive reprogramming. More modern techniques have leaned towards data-driven approaches that offer greater flexibility, yet these often require specialized datasets tailored to each robot's interactions.

GenEM reshapes this approach by leveraging the rich knowledge within LLMs to generate expressive behaviors dynamically, eliminating the need for traditional model training or convoluted rule sets. For instance, LLMs can recognize the importance of eye contact or nodding in various social contexts.

"Our key insight is to utilize the rich social context from LLMs to create adaptable and composable expressive behaviors,” the researchers explain.

Generative Expressive Motion (GenEM)

GenEM employs a sequence of LLM agents that autonomously generate expressive robot behaviors based on natural language commands. Each agent contributes by reasoning about social contexts and translating these behaviors into actionable API calls for the robot.

“GenEM can produce multimodal behaviors utilizing the robot's capabilities—such as speech and body movement—to clearly express intent,” the researchers note. "One of the standout features of GenEM is its ability to adapt to live human feedback, allowing for iterative improvements and the generation of new expressive behaviors."

The GenEM workflow begins with a natural language instruction, either specifying an expressive action like “Nod your head” or establishing a social scenario, such as “A person walking by waves at you.” Initially, an LLM employs chain-of-thought reasoning to outline a human's potential response. Another LLM agent then translates this into a step-by-step guide reflective of the robot's available functions, guiding actions such as head tilting or triggering specific light patterns.

Next, the procedural instructions are converted into executable code, relying on the robot's API commands. Optional human feedback can be incorporated to refine the behavior further, all without training the LLMs—only prompt-engineering adjustments are required based on robot specifications.

Testing GenEM

The researchers evaluated behaviors generated by two variations of GenEM—one incorporating user feedback and the other not—against scripted behaviors crafted by a professional animator. Utilizing OpenAI’s GPT-4 for context reasoning and expressive behavior generation, they surveyed user responses on the outcomes. The results indicated that users generally found GenEM-generated behaviors equally comprehensible as those of a professional animator. Furthermore, the modular, multi-step method in GenEM vastly outperformed the previous single LLM approach.

Crucially, GenEM's prompt-based design is adaptable to any robot type without necessitating specialized datasets for training. It effectively employs LLM reasoning to create complex expressive behaviors from simple robotic actions.

“Our framework rapidly generates expressive behaviors through in-context learning and few-shot prompting, significantly reducing the need for curated datasets or elaborate rule-making as seen in earlier methods,” the researchers conclude.

While still in its early stages, GenEM has primarily been tested in single interactive scenarios and limited action spaces. There’s potential for exploration in robots with more diverse primitive actions, with large language models promising to enhance these capabilities further.

“We believe our approach offers a flexible framework for generating adaptable and composable expressive motion, harnessing the power of large language models,” the researchers conclude.

AMD Introduces Embedded+ Architecture Revolutionizing Edge AI Hardware

Menlo Ventures’ Vision: Shaping the Future of AI Security

Most people like

LessonPlans.ai

29.9K

Discover LessonPlans.ai, an innovative AI-powered platform designed to create high-quality, customized lesson plans for educators. Perfect for teachers seeking to enhance their curriculum, LessonPlans.ai streamlines lesson planning, saving time and effort while ensuring engaging and effective learning experiences.

AI lesson plan generator AI Content Generator

Vanchat

6.2K

Discover how an AI shopping assistant for Shopify can transform customer interactions, driving engagement and boosting sales. By leveraging advanced technology, this innovative tool enhances the shopping experience, making it seamless and personalized for every user. Elevate your Shopify store today with an intelligent assistant that understands customer needs.

AI ChatBot E-commerce Assistant

Storytell.ai

139.4K

Introducing an AI-powered productivity platform specifically designed for teams, revolutionizing the way you collaborate and manage projects. This innovative solution enhances efficiency, streamlines workflows, and fosters seamless communication, empowering your team to achieve more together. Discover how our platform transforms productivity through intelligent automation and insightful analytics.

AI-driven productivity platform Other

Midjourney Art AI

310.6K

Unlock your creative potential with an AI art generator that transforms your text prompts into stunning visuals. Harness the power of advanced artificial intelligence to effortlessly bring your ideas to life, whether you're creating illustrations for a project, designing unique artwork, or simply exploring your imagination. Experience the seamless blend of technology and creativity as this innovative tool converts your written words into captivating artwork, making art accessible to everyone. Start your artistic journey today with our intuitive AI art generator!

AI art generator AI Art Generator

Find AI tools in YBX