Meta has continued its commitment to open-source initiatives with the launch of FACET, a new AI benchmark aimed at assessing the “fairness” of AI models in classifying and detecting images and videos, including those featuring people.
FACET consists of 32,000 images, representing 50,000 individuals labeled by human annotators. The acronym stands for “FAirness in Computer Vision EvaluaTion” and covers various classes related to professions and activities, such as “basketball player,” “disc jockey,” and “doctor,” along with demographic and physical characteristics. This comprehensive approach allows for what Meta calls “deep” evaluations of biases across these categories.
“Our aim with FACET is to empower researchers and practitioners to conduct similar benchmarking efforts. This will help them better understand the disparities in their models and assess the effectiveness of measures implemented to tackle fairness concerns,” Meta stated in a blog post. “We encourage researchers to leverage FACET to evaluate fairness across other vision and multimodal tasks.”
While benchmarks for detecting bias in computer vision algorithms are not new, Meta previously released a benchmark several years ago to identify biases relating to age, gender, and skin tone in both vision and audio machine learning models. Extensive research has uncovered biases within computer vision technology against various demographic groups—often, these biases are pronounced.
However, Meta’s history of responsible AI practices raises questions. Last year, it had to retract an AI demo following the generation of racist and inaccurate scientific content. Reports have labeled the company’s AI ethics team as lacking effectiveness, and its anti-bias tools have been deemed “completely insufficient.” Critics also highlight Meta's algorithms as potentially exacerbating socioeconomic disparities, notably with biases against Black users in automated moderation systems.
Despite these challenges, Meta asserts that FACET surpasses previous benchmarks in examining bias in computer vision, providing insights into questions such as “Are models more effective at classifying people as skateboarders based on perceived gender presentation?” and “Do certain biases worsen when the individual has coily hair compared to straight hair?”
To create FACET, Meta engaged annotators to label the images for demographic attributes, including perceived gender and age, alongside physical traits like skin tone, lighting conditions, hairstyles, and clothing. These annotations were combined with additional labels sourced from Segment Anything 1 Billion, a Meta-developed dataset designed for training computer vision models.
The images utilized in FACET originated from Segment Anything 1 Billion, which acquired them from a photo provider. However, it remains unclear whether the individuals featured in the photos were informed about their use for this benchmark. Additionally, the blog post does not clarify how annotators were recruited or the compensation they received.
Historically, many annotators responsible for labeling datasets in AI projects come from developing nations and often earn far less than the U.S. minimum wage. A recent report highlighted that Scale AI, a leading annotation firm, has been criticized for low pay rates and delayed or withheld payments.
In a white paper detailing FACET’s development, Meta explains that the annotators were "trained experts" from various regions, including the United States, Colombia, Egypt, Kenya, the Philippines, and Taiwan. Meta employed a proprietary annotation platform from a third-party vendor and compensated annotators according to an hourly wage established for each country.
Despite potential ethical concerns surrounding FACET's origins, Meta believes this benchmark can effectively assess classification, detection, “instance segmentation,” and “visual grounding” models across diverse demographic attributes.
As a practical example, Meta applied FACET to its DINOv2 computer vision algorithm, which is now commercially available. The benchmark revealed numerous biases in DINOv2, such as a tendency to misclassify individuals with specific gender presentations and stereotypes associating women with nursing roles.
Meta acknowledged, “The preparation of DINOv2’s pre-training dataset may have unintentionally mirrored biases from the reference datasets we used.” The company indicated a commitment to address these shortcomings in future efforts, emphasizing that image-based curation could help prevent bias stemming from search engines or textual supervision.
While no benchmark is without flaws, Meta openly admits that FACET might not fully reflect real-world concepts and demographic groups. The company recognizes that many professional portrayals in the dataset may have shifted since FACET's creation. For example, a majority of doctors and nurses captured during the COVID-19 pandemic were photographed in full personal protective equipment, unlike previous depictions.
“At this time, we do not plan on updating this dataset,” Meta stated in the white paper. “Users will be able to flag any objectionable content, and we will remove it if identified.”
Alongside the dataset, Meta provides an online dataset explorer tool. Developers can use this tool, but only for evaluation, testing, and benchmarking purposes—not for training computer vision models with FACET.