Not all generative AI models are equal, especially in how they handle contentious topics.
A recent study by researchers from Carnegie Mellon, the University of Amsterdam, and AI startup Hugging Face assessed various open text-analyzing models, including Meta’s Llama 3, to evaluate their responses to questions concerning LGBTQ+ rights, immigration, surrogacy, and more.
The findings revealed inconsistencies in the models' answers, reflecting inherent biases rooted in their training data. “Throughout our experiments, we observed notable discrepancies in how models from different regions approach sensitive subjects,” explained Giada Pistilli, principal ethicist and co-author of the study. “Our research demonstrates substantial variation in the values expressed by model responses based on cultural and linguistic contexts.”
Text-analyzing models are statistical probability machines. They generate responses based on vast datasets, predicting the most probable arrangements of words. If these datasets are biased, the resulting models also show bias, which impacts their answers.
In their study, the researchers evaluated five models—Mistral’s Mistral 7B, Cohere’s Command-R, Alibaba’s Qwen, Google’s Gemma, and Meta’s Llama 3—using a dataset of inquiries and statements spanning issues like LGBTQ+ rights, immigration, and disability rights. To uncover linguistic biases, they presented the models with questions in various languages, including English, French, Turkish, and German.
Questions about LGBTQ+ rights elicited the highest number of "refusals," where models opted not to answer. Questions regarding immigration, social welfare, and disability rights also resulted in significant refusals.
Some models exhibited a tendency to decline "sensitive" questions more frequently than others. For instance, Qwen had over quadruple the refusals compared to Mistral, highlighting the different development philosophies of Alibaba and Mistral.
“These refusals are shaped by the implicit values of the models and the explicit choices made by the organizations behind them, such as tuning decisions to avoid addressing sensitive topics,” Pistilli noted. “Our research reveals significant variations in the values expressed by model responses, influenced by culture and language.”
In Alibaba's instance, these choices may be informed by political dynamics. A BBC report from September indicated that Ernie, an AI chatbot developed by Chinese search giant Baidu, dodged controversial queries involving Tibetan oppression, President Xi Jinping, and the Tiananmen Square incident. Under Chinese regulations, generative AI services must be approved by the Cyberspace Administration, which requires that these services "reflect core socialist values."
Moreover, the differing responses to specific inquiries may reflect fundamental worldview disparities, including those of the annotators who label the training data.
Annotations, which assign specific concepts to data (such as identifying anti-LGBTQ+ rhetoric as negative), originate from contractors who, like all individuals, possess biases that can influence their labeling and, consequently, the performance of the models trained on that data.
The researchers identified contrasting "views" on topics like immigrant asylum in Germany and LGBTQ+ rights in Italy across different models, possibly due to biased annotations. For example, when asked about the assertion that the legal and social privileges of Turkish citizens in Germany should be revoked, Command-R rejected the claim, Gemma refused to answer, and Llama 3 affirmed it.
“If I were a user, I would want to be mindful of the inherent cultural biases within these models,” Pistilli asserted.
While some findings may be surprising, the overarching conclusions are not. It is well-documented that all models carry biases, with some more pronounced than others.
In April 2023, the misinformation watchdog NewsGuard released a report revealing that OpenAI's ChatGPT displayed greater inaccuracies when responding in Chinese compared to English. Other research has highlighted deeply rooted political, racial, ethnic, gender, and ableist biases in generative AI models, which often transcend languages and regions.
Pistilli acknowledged that there is no simple solution to addressing model bias due to its complex nature. However, she expressed hope that the study would underline the necessity of thorough testing of models before they are widely deployed.
“We urge researchers to rigorously assess their models for the cultural perspectives they reflect, whether intentionally or unintentionally,” Pistilli stated. “Our research emphasizes the need for comprehensive social impact evaluations that go beyond traditional statistical measures, both quantitatively and qualitatively. Developing innovative methods to understand model behaviors once deployed and their potential societal effects is critical for creating improved AI models.”