Earlier this month, I witnessed a remarkable technological breakthrough. During a press briefing prior to CES, Nvidia unveiled a demonstration for its Ace microservice, an AI suite capable of creating fully voiced AI characters. I was amazed as a demo presenter engaged with an in-game NPC through a microphone, receiving real-time, lifelike responses from the digital character. It felt like something out of science fiction, but one question lingered: How was this possible?
Nvidia's response was vague, stating there was "no simple answer." This ambiguity sparked intense speculation on social media, with many users expressing concerns that Ace might have been trained on content Nvidia didn’t own. Although Nvidia later clarified that it only utilizes legally obtained data, uncertainty remained. Gamers were still apprehensive, grappling with ethical and artistic issues surrounding this technology.
Observing this dynamic was Purnendu Mukherjee, a software engineer and the mind behind the AI technology central to this controversy. He is the founder of Convai, the generative AI company responsible for powering Nvidia’s Kairos demo at CES 2024. Rather than remaining silent amidst the backlash, Mukherjee took the opportunity to clarify misconceptions directly.
In an extensive interview, Mukherjee addressed various ethical concerns regarding AI tools like his creation. He discussed topics ranging from fears of job displacement to worries that AI might undermine the human touch in art. Contrary to these fears, Mukherjee envisions a future where artists collaboratively harness AI to enhance their creative expressions. However, his insights on data usage raise additional questions.
Mukherjee's fascination with the human mind began in childhood, leading him to explore AI by the time he reached high school. Initially discouraged by rigid, rule-based systems, his interest reignited in 2015 when he dove into deep learning in a lab in India. After pursuing graduate studies and gaining experience at Nvidia, he launched Convai in April 2022, funding the startup independently for the first ten months.
As a lifelong gamer, Mukherjee grew up playing competitive games like Counter-Strike at a local internet café, where he first envisioned how AI could enhance gameplay. What started as a humorous critique of rudimentary game bots has evolved into a groundbreaking innovation. Convai's technology utilizes multiple AI processes to generate fully-voiced NPCs that can respond dynamically to player prompts—aiming to create more engaging gaming experiences.
"Consider titles like Baldur's Gate 3 or The Witcher," Mukherjee explains. "These games have rich narratives and deep character arcs. However, players often miss out on fully exploring these stories due to limited dialogue options with NPCs. With today's technology, we can give NPCs a life of their own—allowing them to interact with players in character and provide deeper insight into the story."
This sentiment opens up a broader discussion as Mukherjee addresses interconnected concerns about AI. When discussing whether Baldur’s Gate 3 would still resonate without its carefully crafted dialogue, we delve into the complex relationship between machine-generated content and artistic integrity. Mukherjee tackles skepticism head-on, emphasizing that AI is not a replacement for artists but rather a tool that necessitates their input.
"I believe narrative designers will be in greater demand, not less," he states, outlining how AI could create additional roles for writers. "Writers must create backstory and narrative while also developing robust test sets. To ensure a generative AI-based NPC can be confidently integrated into a multi-million-dollar game, hundreds or thousands of interactions are needed—ideally crafted by the original narrative writer. Our platform requires users to provide extensive backstory and documents, which ultimately leads to significantly more writing than what's typically done."
This perspective becomes a recurring theme in our dialogue. Mukherjee repeatedly asserts that generative AI tools will require an even larger pool of artists to train effectively. He suggests that improved AI will enhance games' quality, leading to increased sales and higher wages for voice actors who play essential roles in developing these advanced tools. His outlook is optimistic, especially given the current wave of layoffs sweeping through the video game industry.
Mukherjee acknowledges the reality of these layoffs but frames the rise of generative AI as part of a natural technological evolution. He believes that creators will need to adapt and embrace collaborative synergies with AI rather than view it as a threat.
"You remain the creator, master, and controller of it," he asserts.
As we continue, I inquire about artists who passionately create games as an expression of their craft. Is it truly as straightforward as suggesting they pivot to become AI engineers? Mukherjee counters that it’s more about recognizing the intersection of art and technology.
"AI is akin to tools like Adobe Photoshop or Unreal Engine," he explains. "Yes, games existed before these technologies, and creators still hand-crafted them. But can you produce extraordinary art using Unreal Engine? Absolutely. The meticulous detail in 3D video editing remains, even with AI-generated content. The essence of craftsmanship is still present; it's just enhanced by more powerful tools. You are still the creator, the one shaping your vision."
Mukherjee clearly views AI as an asset for artists rather than a substitute. He reiterates several key points regarding the dependency of AI on human creativity while addressing prevalent concerns. Yet, the issue of data usage remains contentious. While critics argue that AI models trained on their work are stealing intellectual property, some developers insist that substantial data—including copyrighted material—is necessary for training effective models. Mukherjee suggests that creators should be compensated when their contributions shape AI training datasets.
"There must be a system in place to ensure that individuals who contribute significant data are fairly compensated," he states. "Whether it's the New York Times or Reddit, proper licensing is essential. It’s a complex issue, but I believe this is the direction we need to pursue, particularly for commercial applications."
When questioned about Convai’s data practices, Mukherjee emphasizes that the company only employs data it has the rights to use. He explains that it would be impossible to randomly scrape the specific data required, given that the technology is pioneering a new field. However, he quickly addresses a paradox in this argument.
"We utilize base models from sources like OpenAI or licensed open-source models," he clarifies. "These must be ethically sourced and commercially licensed. We are meticulous in these processes. In fact, our system often requires more voice actors, not fewer!"
Mentioning OpenAI raises some concerns, particularly given its current legal challenges stemming from The New York Times lawsuit regarding the alleged "unlawful use" of their writing for training bots like ChatGPT. OpenAI acknowledges the difficulty of training advanced AI models without utilizing copyrighted materials. Given that Convai's model is built on OpenAI’s, I press Mukherjee on how he can guarantee that no copyrighted content was involved in their training.
Mukherjee makes a subtle distinction: Convai is not directly using OpenAI’s data, only the models developed from it. This nuance may suggest a legal loophole. Mukherjee believes that since Convai refrains from directly using the data, it remains compliant regarding copyright matters. However, when asked to clarify the distinction between utilizing models versus using potentially copyrighted datasets within those models, his explanation becomes less clear.
"It’s ambiguous which model contains which data," he admits. "We don’t have that clarity. For instance, if OpenAI provides five models, Nvidia four, and Meta three, we simply use the ones that best meet our requirements without knowing their exact data origins."
Mukherjee's reasoning implies that Convai does not bear responsibility for how other models manage their data. His focus is solely on ensuring that Convai's data practices remain ethical while hoping the foundational models are also compliant. Yet his earlier assertion that Convai would work with the most ethical models seems incongruous, particularly given the legal issues surrounding the ones currently employed.
These complex discussions may clarify Nvidia's reluctance to provide answers about data usage initially. The reality is that all these technologies build upon one another. Ace depends on Convai, which is built on OpenAI's work—a layered structure that makes it challenging to identify the data origin at lower tiers. Nvidia’s claim that there is "no simple answer" regarding data usage is accurate, but a more honest explanation may be that they lack comprehensive knowledge of the entire system. While Nvidia is unlikely to face courtroom scrutiny, a significant legal defeat for OpenAI could have far-reaching consequences.
As we unravel these intricate details, I raise the topic of regulation. Should the government intervene to establish guidelines for AI technology? Mukherjee acknowledges the need for some regulation but stresses the importance of a measured approach. He worries that excessive restrictions may stifle innovation, and remains convinced that the benefits of AI outweigh its potential drawbacks.
"What is AI today? Think of it like a car," he compares. "Cars can be dangerous; accidents can happen. Yet we drive them every day because the overall benefits are significant. I view AI in the same light. We will require regulations on its use, just as we regulate how to drive a vehicle. Legal consequences will apply to those who misuse it."
Change is inevitable, and change often brings discomfort.
Despite some grim comparisons, Mukherjee maintains a hopeful outlook on AI. He genuinely believes it will yield substantial benefits for society, provided that companies continue to prioritize human welfare. He envisions a future where tools like Nvidia Ace strengthen artists' talents rather than replacing them. Rather than fearing a future dominated by machines, he recognizes the necessity for adaptation.
"Change is going to happen, and it will impact people," Mukherjee acknowledges. "This is reminiscent of past technological shifts. With each significant change, new job opportunities arise, while older roles may decline. Consider the transition from horse-drawn carriages to automobiles. Those involved in the horse industry had to pivot. Generative AI will open up fresh avenues for creativity and innovation—it’s poised to benefit humanity as a whole, but it will also necessitate shifts in traditional employment."
At the end of our interview, Mukherjee expressed gratitude for the opportunity to clarify misconceptions regarding Convai. He noted that much of the media coverage surrounding Nvidia Ace overlooked his company's contributions. There’s a hint of frustration in his tone as he seeks rightful recognition. I reflect on the irony of this situation, likening his experience to that of artists watching their work exploited by AI tools.
"That's a compelling observation!" he responds with a laugh, possibly gaining newfound perspective on the matter.