Stanford and Meta Move Closer to Human-Like AI with Innovative 'CHOIS' Interaction Model

Home AI News Stanford and Meta Move Closer to Human-Like AI with Innovative 'CHOIS' Interaction Model

Updated on December 8 2023

Researchers from Stanford University and Meta’s Facebook AI Research (FAIR) lab have unveiled a groundbreaking AI system capable of generating realistic, synchronized motions between virtual humans and objects using only text descriptions.

The innovative system, named CHOIS (Controllable Human-Object Interaction Synthesis), leverages advanced conditional diffusion model techniques to facilitate seamless interactions. For instance, it can interpret and animate instructions like “lift the table above your head, walk, and put the table down.”

The research, published on arXiv, hints at a future where virtual beings can interpret and act on language commands as fluidly as humans.

“Generating continuous human-object interactions from language descriptions within 3D scenes presents several challenges,” the researchers stated. They prioritized ensuring that movements appeared realistic, with human hands accurately interacting with objects, and that the objects moved in response to human actions.

How CHOIS Works

CHOIS excels in creating human-object interactions within a 3D space. At its core is a conditional diffusion model, a generative framework capable of simulating detailed motion sequences. Given an initial state of human and object positions along with a language description of the desired action, CHOIS generates a sequence of motions that achieves the task.

For example, if instructed to move a lamp closer to a sofa, CHOIS can generate a lifelike animation of a human avatar picking up the lamp and positioning it next to the sofa.

What sets CHOIS apart is its incorporation of sparse object waypoints and language inputs to guide animations. These waypoints serve as markers for key points in an object's movement, ensuring that the animation is not only realistic but also aligns with the overarching goal described in the language input.

Additionally, CHOIS integrates language comprehension with physical simulation more effectively than traditional models, which often struggle to correlate language with spatial and physical actions over extended interactions. CHOIS interprets the intent and style behind language descriptions and translates them into a series of physical movements while adhering to the constraints of the human body and the involved objects.

This system ensures accurate representation of contact points, such as hands touching objects, and aligns the object's motion with the forces exerted by the human avatar. By employing specialized loss functions and guidance terms during both training and generation phases, CHOIS reinforces these physical constraints, marking a significant advance in AI's ability to understand and interact with the physical world like humans do.

Implications for Computer Graphics, AI, and Robotics

The implications of the CHOIS system for computer graphics are substantial, particularly in animation and virtual reality. By enabling AI to interpret natural language commands for realistic human-object interactions, CHOIS could significantly streamline the animation process, reducing the time and effort traditionally needed for complex scene creation.

Animators could leverage this technology to automate sequences that usually require meticulous keyframe animation. In virtual reality, CHOIS could enable more immersive experiences, where users can direct virtual characters through natural language and observe lifelike task execution, transforming previously scripted interactions into dynamic, responsive environments.

In AI and robotics, CHOIS represents a major leap towards developing autonomous, context-aware systems. Rather than relying on pre-programmed routines, robots could use CHOIS to understand and perform tasks described in human language. This could revolutionize service robots in sectors like healthcare, hospitality, and domestic environments by enhancing their ability to interpret and execute diverse tasks within physical spaces.

Moreover, the capacity to process language and visual input simultaneously allows AI to achieve a level of situational and contextual understanding that has been primarily human. This advancement could lead to AI systems that function as more capable assistants in complex tasks, comprehending not just the "what" but the "how" of human instructions and adapting to new challenges with unprecedented flexibility.

Promising Results and Future Outlook

In summary, the collaborative research from Stanford and Meta marks significant progress at the intersection of computer vision, natural language processing (NLP), and robotics. The researchers view this work as a crucial step toward developing sophisticated AI systems that can simulate continuous human behaviors in varying 3D environments. Furthermore, it paves the way for further exploration into synthesizing human-object interactions from 3D scenes and language inputs, potentially leading to even more advanced AI technologies in the future.

Citrusx Launches Stealth Mode AI to Enhance Explainability and Simplify Compliance for Enterprises

"Transforming Work Efficiency: How AI Can Reduce a Two-Hour Task to Just 15 Minutes with SAP CTO Juergen Mueller"

Most people like

PicFinder.AI

181.7K

Introducing PicFinder.AI: your ultimate tool for turning vivid descriptions into captivating artwork through advanced AI image generation. Explore the power of artificial intelligence and unleash your creativity with every masterpiece created!

AI-generated images AI Art Generator

Janitor AI

46.9M

Unleash your creativity by crafting NSFW fictional chatbot characters with diverse personalities using Janitor AI. This innovative tool allows you to design unique chatbots that bring your imaginative scenarios to life, enhancing your storytelling experience. Dive into the world of interactive fiction and explore endless possibilities!

AI chatbots AI Chatbot

FlyPix AI

8.9K

Unlocking Precise Spatial Intelligence with AI-Enabled Geospatial Solutions Discover how cutting-edge AI-driven geospatial solutions are transforming spatial intelligence. By leveraging advanced algorithms and data analytics, these solutions provide unparalleled accuracy and insights, empowering industries to make informed decisions based on precise geographical data. Engage with the future of spatial analysis and enhance your understanding of our world's complexities.

AI-enabled geospatial solutions AI Image Recognition

Globe Explorer

970.9K

Discover engaging topics and share your insights on Globe Explorer. Join our community to enhance your experience and help us improve!

Topic exploration Research Tool

Find AI tools in YBX