Generative AI has captured the public’s attention, showcasing the ability to produce intricate and seemingly realistic text and imagery from simple verbal prompts. However, there's a notable caveat— upon closer inspection, the results often reveal imperfections. Observers frequently point out oddities such as unnatural finger placements, distorted floor tiles, and mathematical miscalculations that just don't add up.
Now, Synthesia—one of the trailblazing AI startups focused on video— is unveiling an update aimed at overcoming some prevalent challenges in the realm of generative media. Their latest version features custom avatars, modeled after real individuals captured in their studio environment, which are designed to convey more emotion, improved lip synchronization, and enhanced human-like movements when creating text-driven videos.
This release follows significant progress for the company. In contrast to other generative AI firms like OpenAI, which has diversified its strategy with consumer tools like ChatGPT and a robust B2B offering through APIs utilized by independent developers and major enterprises, Synthesia is narrowing its focus. Similar to Perplexity’s commitment to perfecting AI-driven search, Synthesia is dedicated to crafting the most lifelike generative video avatars specifically for business applications, including training and marketing.
This focused approach has enabled Synthesia to carve out a distinct identity in a burgeoning AI market, which risks becoming oversaturated as the initial excitement gives way to fundamental concerns such as annual recurring revenue (ARR), unit economics, and the operational costs associated with AI implementations.
Synthesia touts its new Expressive Avatars, launching on Thursday, as a pioneering advancement: “The world’s first avatars fully generated with AI.” Leveraging large, pretrained models, Synthesia claims its innovation lies in their sophisticated combination to produce multimodal distributions that more accurately reflect human speech patterns. This process, they assert, occurs in real-time, mirroring the spontaneity of human communication. This contrasts with many existing AI video solutions, which typically rely on assembling various video clips to simulate facial responses that align—at least somewhat—with the provided scripts. The goal is to achieve a more natural and lifelike presentation.
In comparing the previous and new versions, it's clear there's still progress to be made—an observation acknowledged by CEO Victor Riparbelli himself. “Of course it’s not 100% there yet, but it’ll be very, very soon, by the end of the year. It’ll be truly mind-blowing,” he stated. “The subtlety of AI is fascinating. When it comes to humans, the minutiae in our facial expressions hold so much data. For instance, while humans intuitively understand how a smile can indicate happiness, it's an incredibly intricate behavior to articulate. Yet, deep learning networks can discern patterns and replicate them effectively.” The next significant challenge, he added, is mastering the representation of hands. “Hands are incredibly tricky,” he remarked.
Synthesia’s B2B focus not only sharpens its messaging but also positions its product in a more secure context amidst widespread apprehensions regarding deepfakes and the potential misuse of AI for disinformation and fraud. Nonetheless, Synthesia has encountered its own controversies; its technology has been exploited to fabricate propaganda in Venezuela and spread false news on platforms promoting pro-China narratives.
In response, the company has been proactive, recently updating its policies to restrict certain content types, investing in the detection of bad faith activities, expanding its AI safety team, and exploring content credential technologies like C2PA.
Despite these challenges, Synthesia has experienced steady growth. The company was last valued at $1 billion following a $90 million funding round nearly a year ago in June 2023. Riparbelli indicated in a recent interview that there are currently no immediate plans for additional funding, although that doesn’t definitively indicate whether Synthesia is being courted by potential investors.
What remains indisputable is that developing and maintaining AI technologies demands significant financial resources, and Synthesia has been diligently investing in this sector. Before the launch of Thursday’s update, approximately 200,000 users had generated over 18 million video presentations across 130 languages using 225 legacy avatars. While specific figures on paid users aren’t disclosed, the company boasts a roster of high-profile clients, including Zoom, the BBC, and DuPont, among others. The startup naturally hopes that the rollout of this new version will further amplify these numbers.