Attendees of this year’s NeurIPS AI conference in Montreal had the unique opportunity to explore a virtual city created by NVIDIA. This simulation holds particular significance due to the innovative technology behind it. Using advanced machine learning techniques and a powerful supercomputer, NVIDIA has developed a method for AI to analyze existing videos and utilize the depicted objects and scenery to construct interactive environments.
NVIDIA's groundbreaking research not only represents a major technical achievement but also aims to simplify the process for artists and developers in crafting realistic virtual worlds. Instead of painstakingly creating objects and characters polygon by polygon, creators can now employ machine learning tools to define these entities and let NVIDIA’s neural network do the rest. "Neural networks — specifically generative models — will transform graphics creation," said Bryan Catanzano, NVIDIA's Vice President of Applied Deep Learning. "This will empower developers, especially in gaming and automotive, to produce scenes at a fraction of the traditional cost."
Here's a closer look at how it works. According to Catanzano, researchers trained a nascent neural model using dashcam footage from self-driving car trials, running the model on one of NVIDIA’s DGX-1 supercomputers for about a week. This machine is considered immensely powerful, likened to "250 servers in a box." Simultaneously, the research team utilized Unreal Engine 4 to create a “semantic map,” labeling every pixel in the scene—some as "cars," others as "trees," and so on. These labels enabled Unreal Engine to generate a basic "sketch," which was then processed by NVIDIA's neural model. The AI filled in the identified objects in real-time, rendering the virtual environment at an impressive 25 frames per second.
NVIDIA’s team also applied their video-to-video synthesis technique in a lighter context, where an AI transformed a team member’s dance moves into a rendition reminiscent of PSY. This involved similar methodology, with the AI determining poses and overlaying a different person's appearance.
While the results showcase potential, they are not as visually detailed as scenes from AAA video games. NVIDIA's sample videos reveal digital cities with objects that seem somewhat realistic, though not fully convincing. The playful Gangnam Style experiment yielded slightly better results.
NVIDIA has open-sourced its underlying code, but it may take time for developers to implement these tools effectively in virtual reality. However, the company highlighted some limitations of their neural network. While the simulation showcased vehicles that appeared lifelike, the model struggled with rendering accurate vehicle movements during turns due to insufficient data from its label maps. Moreover, objects like cars might not always maintain consistency in appearance, such as changing color over time, distancing them from true photorealism.
These technical limitations are concerning, especially considering the potential misuse of such technologies. Deepfakes, for example, demonstrate the challenges in distinguishing between genuine and fabricated videos. Catanzano, however, remains optimistic, pointing out that the majority of virtual experiences are beneficial. "People really enjoy virtual experiences... and we're focusing on positive applications," he stated. Yet, he acknowledged the darker side of technology, referencing historical manipulations in media.
In conclusion, NVIDIA’s research marks a significant advancement in digital imaging, with the potential to reshape the creation and interaction with virtual worlds across commerce, art, and innovation. Nonetheless, this progression raises critical questions about the distinction between reality and fabrication, prompting a need for careful consideration of what these tools are capable of achieving.