You may notice a significant improvement in audio quality for some YouTube Stories, thanks to a new speech enhancement feature from Google. A few years ago, Google introduced its “Looking to Listen” AI technology, designed to isolate voices in crowded environments. Now, this advanced feature is available for creators using iOS devices to record YouTube Stories.
The Looking to Listen technology was developed by training an extensive collection of online videos to understand the relationship between speech and visual signals, including mouth movements and facial expressions. To ensure the technology is bias-free, Google conducted tests assessing its performance across various visual and auditory characteristics, such as the speaker's age, skin tone, spoken language, voice pitch, facial visibility, head position, facial hair, eyewear, and background noise levels. The results showed that the speech enhancement capabilities are consistent across different languages, with minimal impact from facial hair—working best on clean-shaven faces.
In its announcement, Google highlighted improvements made over the past few years. Notably, the processing is now done directly on the device, eliminating the need to send data to a remote server. Additionally, a new technique enables the rapid extraction of thumbnails containing faces from videos for analysis, allowing for speech enhancement to begin while recording. These advancements reduced the feature's size from 120MB to just 6MB, facilitating easier deployment. Google also improved performance, cutting processing time from ten times real-time on a desktop to just half a second on an iPhone CPU. This means the technology can process a 15-second Story in mere seconds.
To activate this feature, creators simply need to toggle on "Enhance speech" in the volume controls on their iOS devices. Experience the enhanced audio quality in the accompanying videos.