AI in Cooperative Gaming: Teaching Empathy Through Hanabi
AI has proven its prowess in competitive gaming, easily outmatching the best human players. However, real-life scenarios are not zero-sum games like poker or Starcraft. Cooperation is key. To enhance AI's ability to work alongside humans, a research team from Facebook trained an AI to play Hanabi, a cooperative card game. The aim? To better understand human thinking.
Noam Brown, a Facebook AI researcher, explains that "theory of mind"—the ability to comprehend the intentions of others—is crucial. "AIs have struggled with this for a long time," he noted. “It’s about putting oneself in the shoes of other players and discerning why they are making certain actions.”
Created by French game designer Antoine Bauza in 2010, Hanabi positions two to five players to collaboratively build five stacks of cards, which must be ordered numerically and color-coded. Players can only see others’ cards while their own are hidden, leading to a unique challenge: sharing limited information with teammates at the cost of precious "information tokens." This dynamic encourages players to infer the state of the game based on their teammates' actions and choices.
Traditionally, successful AI in games like Go and DOTA2 utilized reinforcement learning. Facebook's team enhanced this approach by integrating a real-time search function, as utilized in the Pluribus AI, which triumphed over top Texas Hold ’Em players.
“Our search method works alongside a precomputed strategy,” wrote Hengyuan Hu and Jakob Foerster from Facebook. This "blueprint policy" entails agreed-upon strategies, which foster effective teamwork. According to Facebook AI researcher Adam Lerer, players begin with a general strategy and adjust based on current gameplay dynamics.
The Hanabi AI mirrors this by establishing an initial blueprint of possible moves and refining its strategy in real-time. It can designate one or multiple players as "searchers," who interpret teammates’ actions under the assumption they are all following the blueprint policy.
In a single-agent search, a searcher maintains a probability distribution regarding their own cards, updating it based on observed actions. Multi-agent search, while more complex, enables each player to assess strategies previously employed by others. This approach has led the AI to achieve impressive scoring milestones: the current state-of-the-art RL algorithm averages 24.08 points in two-player Hanabi, while incorporating single-agent search boosts the average score to 24.21. Multi-agent search elevates this further to 24.61.
While achieving high scores in Hanabi is remarkable, Facebook's aspirations extend beyond gaming. “We aim to develop AI that can better reason about cooperative interactions,” Lerer stated. This includes chatbots capable of understanding conversational context without needing explicit details. The research could also apply to self-driving cars, allowing them to react to scenarios—like a pedestrian crossing—without direct observation. In the future, the team plans to explore mixed cooperative-competitive games, such as Bridge, to deepen their insights into AI cooperation.