In 2016, Alphabet's DeepMind introduced AlphaGo, an AI that consistently defeated top human Go players. The following year, DeepMind advanced its technology with AlphaGo Zero, which learned to play Go solely by competing against itself, rather than through observing past games. This was followed by AlphaZero, a universal AI capable of mastering Go, chess, and shogi with a single algorithm.
What distinguishes these AIs is that they were pre-programmed with the rules of their respective games. DeepMind's most recent creation, MuZero, takes this a step further—it adapts to multiple games, including Go, chess, shogi, and various Atari games, without being explicitly taught the rules. This breakthrough showcases MuZero's ability to independently learn and outperform its predecessors.
The challenge faced by AI researchers was to create an algorithm that could navigate environments where the rules were unknown, yet still enable effective planning for success. DeepMind approached this hurdle using a technique called lookahead search, where the algorithm considers future states to devise a strategic course of action. This approach is akin to thinking several moves ahead in chess or Starcraft II, weighing potential opponent reactions to maximize winning chances.
However, many real-world scenarios and even certain games are governed by complex, often unpredictable rules. Consequently, some researchers attempted to circumvent this issue by building models that predict how specific factors affect outcomes, but modeling every variable can be incredibly challenging, especially in intricate environments like most Atari games.
MuZero effectively merges the best elements of both methodologies. Instead of attempting to model every detail, it focuses only on the crucial factors needed to inform decisions. Much like humans, who gauge weather conditions without getting bogged down in meteorological intricacies, MuZero identifies the essential components that influence its choices.
When making decisions, MuZero evaluates three key factors: the results of its previous actions, its present situation, and the optimal next steps. This straightforward yet powerful approach has made MuZero DeepMind's most effective algorithm to date. Testing revealed that MuZero matches AlphaZero's performance in chess, Go, and shogi, and surpasses all previous algorithms—including Agent57—in Atari games. Notably, allowing MuZero additional time to consider its actions leads to improved outcomes. Even when restricted to a limited number of simulations in Ms. Pac-Man, it achieved impressive results.
While high scores in Atari games demonstrate its capabilities, the real-world implications of MuZero's research could be transformative. Although a fully developed general-purpose algorithm is still a work in progress, MuZero represents a significant step toward addressing complex challenges in fields such as robotics, where straightforward rules are rare.