Google DeepMind's Genie: Creating Super Mario-Style Games from Images

DeepMind has established its reputation in the artificial intelligence landscape by leveraging video games to test and refine its algorithmic innovations. Fourteen years down the line and following its acquisition by Google, gaming continues to play a pivotal role in its research endeavors. Enter Genie, a groundbreaking model that empowers users to transform images into immersive video game scenes.

Genie, short for Generative Interactive Environments, has been trained on extensive collections of internet videos, enabling it to craft interactive gameplay environments from a variety of input sources, including images, videos, and even user sketches that it has never encountered before. Imagine uploading a photograph of a clay sculpture; Genie can generate a vibrant 2D representation reminiscent of classic platformers like Super Mario Bros., all from a single image.

While the concept may initially seem like a mere novelty, DeepMind posits that Genie holds significant implications for the development of generalist agents—AI systems capable of performing a diverse range of tasks. Genie serves as a versatile method for learning latent actions through videos that can be adapted to human-designed environments without necessitating additional domain expertise.

DeepMind has explored various applications for Genie by training it on videos devoid of specific actions. Remarkably, the model demonstrated the ability to comprehend movements and adapt to new environments autonomously, without the need for detailed instructions.

The team behind Genie has emphasized that this initiative is merely the beginning of what could be achieved in the future. With an astonishing dataset of 200,000 hours of internet videos featuring 2D platformer games like Super Mario and robotic data from RT-1, Genie has learned intricate controls and recognized diverse actions across generated environments. This learning mimics human observational learning; for example, show Genie an image of a character poised near a ledge, and it can accurately infer that the character would perform a jump, crafting an engaging scene based on that action.

Genie operates with 11 billion parameters, establishing itself as a “foundation world model.” This classification is significant as it indicates a system that learns and understands the intrinsic mechanics of the world. For further insights, interested readers might explore comments from experts like Yann LeCun, Meta's Chief AI Scientist, on the definition of a world model.

Tim Rocktäschel, a research scientist at DeepMind involved in the Genie project, acknowledged the capabilities of Sora, OpenAI's latest video generation model, noting its impressive visual output. However, he reiterated LeCun's point that a world model fundamentally requires the integration of actions to be truly effective.

As of now, there has been no official announcement regarding the availability of the Genie model for public access or its potential integration into future Google offerings. However, a showcase page provides glimpses into the model's exemplary projects, allowing users to see Genie’s creative capabilities in action.

Most people like

Find AI tools in YBX