Facebook's New Benchmarking System: Humans Needed to Evaluate AI Performance

Home AI News Facebook's New Benchmarking System: Humans Needed to Evaluate AI Performance

Updated on September 24 2020

Benchmarking in AI Development

Benchmarking is an essential step in enhancing artificial intelligence, providing a clear assessment of an AI's capabilities while allowing researchers to gauge its performance on specific tasks. However, traditional benchmarking has its limitations. Once an algorithm becomes adept at a static dataset, researchers must invest significant time creating new benchmarks to further advance the AI. As AI technology evolves, the demand for new benchmarks has increased. For instance, it took the research community approximately 18 years to reach human-level performance on the MNIST dataset and roughly six years to surpass humans on ImageNet. In contrast, only one year was needed to exceed human performance on the GLUE benchmark for language understanding.

Furthermore, existing benchmarks can harbor biases that algorithms might exploit, often leading to inaccurate evaluations. For example, image recognition AIs can overlook subtle contextual differences, such as misunderstanding "how much" versus "how many," and simply respond with "2."

In response to these challenges, Facebook's AI Research (FAIR) lab has introduced a novel approach to benchmarking by incorporating human input directly into the training of their natural language processing (NLP) models. This initiative, named Dynabench (short for "dynamic benchmarking"), involves humans interacting with NLP algorithms through probing and linguistically challenging questions, designed to test the models' capabilities and identify weaknesses. The fewer times the algorithm is fooled, the better its performance.

Compared to static benchmarks, this dynamic system minimizes issues like saturation and bias, enabling more accurate measurements that reflect real-world applications. According to FAIR researcher Douwe Kiela, “The process cannot saturate, it will be less prone to bias and artifacts, and it allows us to measure performance in ways that are closer to the real-world applications we care most about.”

One significant advantage of Dynabench is its accessibility; anyone can participate by logging into the Dynabench portal and engaging with a range of NLP models, requiring only basic English proficiency. Looking ahead, Kiela and his team aim to enhance the system's capabilities by integrating more models, modalities, and languages.

YouTube Introduces AI Noise Cancellation Feature for Stories on iOS Devices

"Train Google's AI by Lip Syncing 'Dance Monkey' by Tones and I"

Most people like

Levity

71.4K

Levity is a user-friendly no-code platform designed for AI automation, allowing individuals and businesses to effortlessly automate tasks without the need for any programming skills.

AI automation AI Email Assistant

Tars

163.2K

Tars is an innovative platform designed to boost customer engagement and support using advanced AI-powered chatbots. By streamlining interactions, Tars helps businesses provide personalized assistance and improve user experiences.

chatbot platform AI Chatbot

TalkNotes

116.2K

Transform your spoken words into clear, organized, and valuable content. Transcribe, refine, and structure your voice effectively for maximum usability.

voice to text AI Speech Recognition

a1.art

1.4M

A comprehensive platform designed for creating and exploring innovative AI art applications.

AI art generator AI Art Generator

Find AI tools in YBX