AI models designed for gaming have been around for decades, typically focused on mastering a single game with the goal of winning. However, Google DeepMind researchers are innovating with a new model that not only plays multiple 3D games like a human but also strives to understand and act on verbal instructions.
While traditional "AI" in games often consists of NPCs (non-player characters) that respond to specific in-game commands, DeepMind’s SIMA (scalable instructable multiworld agent) takes a different approach. SIMA does not rely on access to the game’s internal code or rules. Instead, it is trained using extensive hours of gameplay videos featuring human players. Through this training process and with guidance from data labelers, the model learns to recognize and associate visual elements like actions, objects, and interactions.
For example, SIMA might interpret specific pixel movements as actions such as “moving forward” or recognize that approaching and using a door-like object means “opening a door.” These are not just simple commands; they involve completing tasks or events that extend beyond mere button presses.
The training footage includes a variety of games, from Valheim to Goat Simulator 3, with developers’ consent for this use. A primary objective set by the researchers during a press call was to explore whether training an AI with a set of games could enable it to adapt to others it hasn’t directly encountered—this capability is known as generalization.
The findings are promising, albeit with limitations. AI agents trained on a diverse array of games tend to perform better in unencountered games; however, unique mechanics or terminology in specific games can still pose challenges that may hinder even a well-prepared AI. Nonetheless, the only barrier for the model to overcome is the lack of available training data.
Interestingly, despite the numerous terms used in gaming, the variety of core actions that impact gameplay is limited. Whether a player is gathering materials to build a shelter or summoning a magical abode, fundamentally, they are still engaged in “building a house.” This realization leads to a compelling framework where SIMA recognizes a wide array of actions, summarized in a list of primitives.
The researchers aim to move beyond conventional agent-based AI to create a more dynamic gaming companion. “Instead of facing a superhuman opponent, you can have SIMA players alongside you, capable of cooperation and following your instructions,” explained Tim Harley, one of the project leaders.
As SIMA operates solely based on the on-screen pixels, it learns mechanics in a manner similar to humans. This provides it with the adaptability to demonstrate emergent behaviors during gameplay.
You might be wondering how this innovative method compares with the traditional simulator approach for creating agent-type AIs. In the simulator model, AI learns through experimentation in a 3D environment that runs at super-fast speeds compared to real time, enabling intuitive understanding of the rules and self-directed behavior development without extensive annotation.
Harley noted that “traditional simulator-based training involves reinforcement learning, requiring a 'reward' signal from the game.” This reward signal can include win/loss outcomes in games like Go or Starcraft, or score metrics in Atari games. DeepMind’s Agent57, which played across a suite of 57 Atari games, exemplifies this effectiveness.
“In our current projects, particularly with commercial games from our partners,” he continued, “we lack access to such a reward system. Moreover, we’re focused on creating agents capable of executing a wide range of tasks dictated by open-ended text—something that makes establishing reward signals for every possible goal impractical. Consequently, we leverage imitation learning from human actions, guided by text-based goals.”
This approach avoids the limitations of strict reward structures, which can confine the agent’s ambitions to score maximization. Instead, when trained with broader aspirations based on previously successful actions, the agent is encouraged to explore a wide variety of tasks as long as the training data reflects those possibilities.
Other companies are increasingly interested in pursuing open-ended collaboration with AI as well, notably exploring conversational AI in NPCs or simulating unplanned actions in emerging research focused on agent behavior.
Overall, the exploration of AI in gaming is unfolding in exciting ways, including projects like MarioGPT that delve into endless gameplay scenarios.
AI, Gaming, Google DeepMind