Meta's Latest Dataset: Enhancing Speech Recognition with Clusters of Diverse Speakers

Home AI News Meta's Latest Dataset: Enhancing Speech Recognition with Clusters of Diverse Speakers

Updated on November 8 2024

In 2023, despite significant advancements in generative AI, voice assistants remain frustratingly unresponsive, resembling their 2011 counterparts. However, Meta AI has introduced a groundbreaking dataset aimed at enhancing automatic speech recognition (ASR) tools by grouping speech at the "utterance level."

Meta has continuously worked to elevate ASR performance, enabling systems to learn without transcripts, recognize over 4,000 languages, and even surpass human lip-reading skills. Traditional datasets often categorize speech by demographics—such as age, gender, and accent—which limits the variety of pronunciations and hampers understanding across diverse user groups.

To address this issue, Meta AI has created a dataset utilizing an innovative utterance clustering method. “Instead of dividing a dataset based on speakers’ demographic information, our algorithm clusters speech at the utterance level,” the Meta AI team explains. This approach allows for the aggregation of similar utterances from a wide range of speakers, facilitating model training on diverse clusters and employing fairness datasets to assess the impact across various demographic groups.

The new dataset comprises over 27,000 command utterances collected from 595 paid volunteers in the U.S., encompassing seven key themes: music, capture, utilities, notification control, messaging, calling, and dictation. Prompts included common tasks like searching for a song or coordinating meetups with friends.

For evaluation, Meta trained a model using publicly available English-language Facebook videos, then assessed its performance with Casual Conversations v1 and another de-identified dataset featuring 48,000 spoken utterances from 867 individuals. Results were encouraging, indicating a 10% increase in overall ASR performance with notable improvements across all demographic groups, particularly for accents. The age group of 66-85, historically underrepresented in voice command applications, demonstrated significant gains as well.

“This algorithm reflects Meta’s commitment to responsible AI and our ongoing efforts to address fairness challenges,” the researchers noted. Moving forward, the team plans to adapt this system for other languages, further broadening its reach and effectiveness.

FTC Initiates Investigation of ChatGPT Developer OpenAI

Meta Set to Launch AI Model for Commercial Use Soon: What You Need to Know

Most people like

Lindo

Lindo is an AI-driven website builder designed specifically for businesses, simplifying the website creation process without requiring any coding expertise.

website builder AI App Builder

IDScan.net

42.9K

In today’s digital landscape, ensuring secure age and identity verification is more crucial than ever. With the rise of online interactions, businesses face the challenge of confirming users' identities while adhering to regulations. AI-powered solutions are revolutionizing the way organizations handle these processes, providing accurate, efficient, and scalable verification methods. Discover how AI-driven age and identity verification technology can enhance security, streamline operations, and protect against fraud in an increasingly complex online environment.

ID scanning AI Developer Tools

HighlightFactCheck.com

A comprehensive platform designed for swift and precise online fact-checking.

fact-checking Other

Pica AI Art Generator

2.3M

Pica AI Art Generator is a user-friendly online platform designed for creating stunning AI-generated artwork. Effortlessly transform your ideas into captivating visuals using cutting-edge artificial intelligence technology.

AI art Text to Image

Find AI tools in YBX