Facebook's AI Research team has developed an innovative AI called Vid2Play, which extracts playable characters from videos of real people, advancing the concept of '80s full-motion video (FMV) games like Night Trap. This technology employs neural networks that analyze videos of individuals performing specific actions and subsequently recreate those characters and actions in diverse environments, allowing users to control them via a joystick.
To accomplish this, the team utilized two neural networks: Pose2Pose and Pose2Frame. The process begins with feeding a video into the Pose2Pose network, which specializes in actions such as dancing, tennis, and fencing. This network identifies the subject's position relative to the background and isolates their poses. Next, Pose2Frame takes the isolated person, along with their shadow and any objects they hold, and integrates them into a new scene while minimizing artifacts. Users can then control the character's movements—mirroring poses from the original video—using a joystick or keyboard.
The system requires only brief video clips of each activity—fencing, dancing, and tennis—to train effectively, demonstrating an impressive ability to filter out other individuals and adjust for varying camera angles. This research bears similarities to Adobe's "content-aware fill," which employs AI to remove unwanted elements from videos, and aligns with developments from companies like NVIDIA that transform real-life footage into virtual game landscapes.
While the characters exhibit some motion issues—such as appearing to glide across the ground, known as "foot slide" in 3D animation—they appear more realistic against backgrounds compared to earlier character extraction attempts. As this research is in its early stages, there is potential for future improvements in motion fluidity.
Vid2Play has promising implications for personalized gaming experiences, allowing players to insert themselves or their favorite YouTube personalities into games. The team asserts, "It addresses a computational problem not previously fully met, paving the way for the creation of video games with realistic graphics. Additionally, controllable characters extracted from YouTube-like videos can seamlessly integrate into virtual worlds and augmented realities."