Microsoft Ignite 2023 Unveils Innovative AI Tool for Photorealistic Avatars
At the Microsoft Ignite 2023 event, an unexpected new feature has been launched: the Azure AI Speech text-to-speech avatar tool. This innovative product allows users to create photorealistic avatars that can animate and "speak" lines the original person may not have uttered. Now in public preview, this tool enables users to generate videos featuring an avatar by simply uploading images of their desired character and writing a script. The technology leverages a specialized model to animate the avatar, while a separate text-to-speech system, either prebuilt or custom-trained on the individual’s voice, delivers the script.
According to Microsoft, “With the text-to-speech avatar, users can create training videos, product introductions, and customer testimonials efficiently, all through text input.” The avatars can communicate in various languages and can utilize AI models like OpenAI’s GPT-3.5, allowing them to address off-script questions in chatbot scenarios.
However, the potential for misuse of such a tool is significant. Microsoft is aware of the risks; for instance, similar technologies from AI startup Synthesia have previously been exploited to generate propaganda in Venezuela and spread misinformation by pro-China social media outlets. Initially, most Azure subscribers will only access prebuilt avatars; custom avatars are limited to registrations for specific use cases, according to Microsoft.
These developments introduce critical ethical concerns. For example, during the recent SAG-AFTRA strike, the use of AI for creating digital likenesses was a focal point. Studios eventually agreed to compensate actors for their AI-generated likenesses. But how will Microsoft address potential ethical dilemmas regarding companies using actors’ likenesses without proper consent or compensation?
I reached out to Microsoft seeking clarity on its stance regarding the unauthorized use of actors’ likenesses. They did not respond before publication nor confirm whether they would require companies to label AI-generated avatars, similar to regulations on platforms like YouTube.
In a follow-up, a Microsoft spokesperson clarified that obtaining "explicit written permission" and consent from avatar talent is mandatory for custom avatar projects. Additionally, customers must ensure that their agreements define the duration and intended use, with disclosures required to inform that these avatars are AI-generated.
Personal Voice: A New AI Tool for Voice Synthesis
In conjunction with the avatar feature, Microsoft is rolling out another generative AI tool called Personal Voice, set within its custom neural voice service. This tool can replicate a user’s voice within seconds, based on a one-minute audio sample. Microsoft positions this capability as a way to create personalized voice assistants, localize content into different languages, and produce tailored narrations for audio books and podcasts.
To prevent legal complications, Microsoft has instituted strict guidelines: prerecorded speech is prohibited, and users must provide “explicit consent” via a recorded statement. Before using Personal Voice for speech synthesis, this statement must be verified against other one-time-use training data. Currently, access to this feature is limited to registered users who agree to restrict its use to applications that do not involve reading user-generated or open-ended content.
The company's guidelines stipulate, “Voice model usage must be confined to specific applications, and output cannot be published or shared outside these applications.” Eligible customers maintain sole control over the creation and usage of voice models, primarily for dubbing purposes in film and entertainment.
Initially, Microsoft did not provide clarity regarding how actors might be compensated for their voice contributions or whether it would implement watermarking technology to help identify AI-generated voices. However, later communications confirmed that watermarks would be automatically added to personal voices, enhancing transparency around synthesized speech and providing clarity on the original source of the voice. A noted limitation is that incorporating watermark detection into applications requires Microsoft’s approval, which may complicate broader usage.
Conclusion
The launch of the Azure AI Speech text-to-speech avatar and Personal Voice tools at Microsoft Ignite 2023 marks a significant advancement in the use of artificial intelligence for creating digital representations. As these tools evolve, they bring critical questions of ethics, consent, and compensation to the forefront, necessitating careful consideration by developers and companies alike.