In recent years, the time required for AI to clone a person’s voice has dramatically decreased. What once took minutes can now be accomplished in mere seconds.
OpenAI, the Microsoft-backed innovator behind the popular generative AI chatbot ChatGPT, has recently unveiled its voice-cloning technology, which needs only 15 seconds of audio to accurately reproduce a person’s voice.
In a recent blog post, OpenAI provided a sneak peek of a model called Voice Engine, which has been under development since late 2022. This innovative technology requires a minimum of 15 seconds of spoken material. Users can then input text to generate what OpenAI describes as “emotive and realistic” speech that closely resembles the original speaker.
OpenAI emphasizes its commitment to a “cautious and informed approach” regarding the wider release of this technology, highlighting the potential risks of synthetic voice misuse. The company aims to initiate conversations about the responsible use of synthetic voices and how society can adapt to these emerging technologies. They stated, “Based on these discussions and the outcomes of our small-scale tests, we will make informed decisions on how to proceed with this technology.”
One significant concern is the misuse of voice-cloning technology in scams. Criminals are already utilizing similar publicly available tools to clone voices and deceive friends or relatives into transferring money under false pretenses. The technology also raises concerns in the political sphere, as evidenced by a recent incident where a robocall using a synthesized version of President Joe Biden’s voice urged people not to vote in January’s New Hampshire primary.
Additionally, the rapid advancements in voice cloning provoke worries among voice actors, who fear being pressured to relinquish their voice rights for AI-generated versions, often with compensation that falls short of what they would receive for live performances.
On a more positive note, OpenAI suggests that this technology could greatly assist non-readers and children by providing engaging, natural-sounding voices that represent a diverse range of speakers, enhancing learning experiences. Moreover, it could facilitate instant translation of videos and podcasts—a feature already being tested by Spotify. The technology could also help patients gradually losing their voice due to illness maintain communication using a synthetic version of their own voice. OpenAI showcases examples of AI-generated audio alongside reference materials on its website, and many will agree that the results are quite remarkable.