In 2023, despite significant advancements in generative AI, voice assistants remain frustratingly unresponsive, resembling their 2011 counterparts. However, Meta AI has introduced a groundbreaking dataset aimed at enhancing automatic speech recognition (ASR) tools by grouping speech at the "utterance level."
Meta has continuously worked to elevate ASR performance, enabling systems to learn without transcripts, recognize over 4,000 languages, and even surpass human lip-reading skills. Traditional datasets often categorize speech by demographics—such as age, gender, and accent—which limits the variety of pronunciations and hampers understanding across diverse user groups.
To address this issue, Meta AI has created a dataset utilizing an innovative utterance clustering method. “Instead of dividing a dataset based on speakers’ demographic information, our algorithm clusters speech at the utterance level,” the Meta AI team explains. This approach allows for the aggregation of similar utterances from a wide range of speakers, facilitating model training on diverse clusters and employing fairness datasets to assess the impact across various demographic groups.
The new dataset comprises over 27,000 command utterances collected from 595 paid volunteers in the U.S., encompassing seven key themes: music, capture, utilities, notification control, messaging, calling, and dictation. Prompts included common tasks like searching for a song or coordinating meetups with friends.
For evaluation, Meta trained a model using publicly available English-language Facebook videos, then assessed its performance with Casual Conversations v1 and another de-identified dataset featuring 48,000 spoken utterances from 867 individuals. Results were encouraging, indicating a 10% increase in overall ASR performance with notable improvements across all demographic groups, particularly for accents. The age group of 66-85, historically underrepresented in voice command applications, demonstrated significant gains as well.
“This algorithm reflects Meta’s commitment to responsible AI and our ongoing efforts to address fairness challenges,” the researchers noted. Moving forward, the team plans to adapt this system for other languages, further broadening its reach and effectiveness.