In 2019, Amazon enhanced its Alexa assistant with a feature that enabled it to recognize when a user was frustrated and respond with increased empathy. For instance, if a customer asked Alexa to play a song but received the wrong track, and responded with a displeased “No, Alexa,” the assistant could then offer an apology and seek clarification. Now, the team behind one of the data sets used to train the text-to-image model, Stable Diffusion, aims to extend similar emotion-recognition capabilities to developers at no cost.
This week, LAION, the nonprofit organization focused on creating image and text data sets for generative AI training, including Stable Diffusion, introduced the Open Empathic project. Open Empathic's goal is to “equip open-source AI systems with empathy and emotional intelligence,” according to the group's statement.
“The LAION team, with expertise in healthcare, education, and machine learning, identified a significant gap in the open-source community: emotional AI development was largely neglected,” Christoph Schuhmann, a co-founder of LAION, explained via email. “Just as our concerns about non-transparent AI monopolies led to the establishment of LAION, we felt a similar urgency regarding emotional AI.”
Through Open Empathic, LAION is encouraging volunteers to submit audio clips to a database intended for creating AI, including chatbots and text-to-speech models capable of interpreting human emotions.
“With Open Empathic, our aim is to develop AI that comprehends more than just words,” Schuhmann emphasized. “We strive for it to recognize the nuances of expressions and tone variations, enriching human-AI interactions with authenticity and empathy.”
Founded in early 2021 by Schuhmann, a German high school teacher, along with several AI enthusiasts from a Discord community, LAION—short for “Large-scale Artificial Intelligence Open Network”—is funded by donations and public research grants, including contributions from AI startup Hugging Face and Stability AI, the company behind Stable Diffusion. The organization is committed to democratizing AI research resources, starting with training data.
“We have a clear mission: to leverage the power of AI for the betterment of society,” said Kari Noriy, an open-source contributor to LAION and a PhD student at Bournemouth University. “We’re dedicated to transparency and believe that the best way to shape AI is through an open approach.”
Thus, Open Empathic was born.
In its initial phase, LAION has created a website where volunteers can annotate YouTube clips featuring individuals speaking. Volunteers may either select clips pre-screened by the LAION team or contribute their own. For each clip, participants can fill out comprehensive fields, including a transcription, audio and video descriptions, and details about the speaker's age, gender, accent (e.g., “British English”), arousal level (alertness—not sexual), and valence level (ranging from “pleasant” to “unpleasant”).
Additional fields assess the audio quality and background noise levels. However, the primary focus is on the perceived emotions of the speaker, allowing volunteers to choose from a variety of emotions using drop-down menus, including categories like “chirpy,” “brisk,” and “engaging.”
“We aspire to train AI models that understand a diverse array of languages and accurately interpret different cultural contexts,” Noriy stated. “Our objective is to develop models that truly comprehend languages, using videos that capture authentic human emotions and expressions.”
Once volunteers submit a clip to LAION’s database, they can continue to annotate without limits on the number of submissions. LAION aims to collect approximately 10,000 samples in the coming months, with hopes of reaching between 100,000 to 1 million by the following year.
“We have dedicated community members who, driven by a vision of democratizing AI models and data sets, selflessly contribute their time for annotations,” Noriy added. “Their motivation stems from the shared aspiration of creating accessible, empathic, and emotionally intelligent open-source AI.”
Exploring Emotion Detection
Beyond Amazon's Alexa, numerous startups and tech giants are developing AI that can identify emotions for various applications, from improved sales training to preventing driver fatigue.
In 2016, Apple acquired Emotient, a California-based company focused on AI algorithms for facial expression analysis. A notable player in this space, Affectiva—an MIT spin-off—was acquired by Smart Eye last year, having claimed its technology could detect anger or frustration in speech in merely 1.2 seconds. Moreover, Microsoft’s acquisition of speech recognition platform Nuance in April 2021 showcased similar emotional analysis capabilities within vehicles.
Other contenders in the emotion detection market include Hume, HireVue, and Realeyes, which have developed technology to assess viewer reactions to specific advertisements. Employers have begun using emotion-detecting systems to evaluate potential hires on empathy and emotional intelligence. Additionally, educational institutions have employed such technology to gauge student engagement both in the classroom and remotely. Moreover, some governments have tested emotion recognition AI at border control points in the U.S., Hungary, Latvia, and Greece to identify “dangerous individuals.”
The LAION team envisions beneficial, ethical applications of emotional AI across various domains, including robotics, psychology, training, education, and gaming. Schuhmann envisions robots providing support and companionship, virtual assistants that recognize when individuals feel lonely or anxious, and tools that aid in diagnosing mental health conditions.
However, one major concern is that the scientific basis behind emotion detection is often shaky at best. There are few universally accepted indicators of emotion, casting doubt on the accuracy of emotion-detecting AI. Most existing systems are based on psychologist Paul Ekman's research from the 1970s, yet subsequent studies—including Ekman’s own work—indicate significant variations in emotional expression across cultures.
For instance, while fear might be universally recognized as a facial expression, in Malaysia, it can represent aggression. Furthermore, research has shown that American and Japanese individuals react differently to violent films, often displaying entirely distinct expressions based on their social contexts.
The complexity extends to vocal characteristics, encompassing people with disabilities, those with autism, and speakers of various languages and dialects such as African-American Vernacular English (AAVE). A native French speaker attempting to answer a survey in English may exhibit hesitations that could be misinterpreted as emotional indicators by those unfamiliar with their language proficiency.
A critical issue also lies in bias—both implicit and explicit—instilled by the annotators whose contributions inform emotion detection models. A 2019 study revealed that reviewers were more likely to label phrases in AAVE as negative compared to their standard American English equivalents. Factors like sexual orientation and gender identity heavily influence how annotators perceive language, highlighting the potential for biases in emotion ratings.
The implications of these biases can be severe. For instance, Retorio, an AI hiring platform, showed different reactions to the same candidate based on their appearance, such as wearing glasses or a headscarf. A 2020 MIT study indicated that facial analysis algorithms might skew emotional assessments based on certain expressions, such as smiling, reducing their objectivity. Recent investigations suggest that popular emotional analysis tools often assign more negative emotions to Black men's faces compared to those of white men.
Addressing Bias in Emotion Detection
So, how does the LAION team plan to handle these biases and ensure representation within their data set—specifically, to prevent overrepresentation of any group, such as white individuals, and to accurately label nonbinary individuals?
The specifics are still somewhat ambiguous. Schuhmann stated that the data submission process for Open Empathic is not an “open door,” emphasizing that LAION has established systems to uphold the integrity of contributions. “We can validate a user’s intent and consistently monitor annotation quality,” he noted.
Nonetheless, LAION's prior data sets have faced scrutiny. Some analyses of LAION ~400M, an image training set curated using automated tools, uncovered troubling content, including images depicting violence and hate symbols. Furthermore, bias in the data set was evident, as certain keywords returned male images for “CEO” while exclusively yielding male images for “terrorist.”
Schuhmann is relying on community involvement to rectify these issues this time around. “We trust in the potential of passionate hobbyists and enthusiasts globally to contribute to our data sets,” he remarked. “While we foster an open and collaborative environment, we prioritize authenticity and quality in our data.”
Regarding the future applications of any AI trained on the Open Empathic data set—whether biased or unbiased—LAION is committed to its open-source philosophy, even with potential risks of misuse.
“Using AI to interpret emotions is a potent endeavor, rife with challenges,” explained Robert Kaczmarczyk, a LAION co-founder and physician at the Technical University of Munich. “As with any technology, it can be wielded for both beneficial and harmful outcomes. Imagine if only a select few had command over such powerful tools while the majority remained uninformed—that disparity could breed misuse or manipulation.”
In the realm of AI, unregulated approaches can lead to unforeseen consequences, as shown by the misuse of technologies like Stable Diffusion for creating inappropriate content and non-consensual deepfakes.
Notably, advocacy groups for privacy and human rights, like European Digital Rights and Access Now, have urged for a total ban on emotion recognition technologies. The EU AI Act, a newly enacted legislation in the European Union establishing a governance framework for AI, prohibits the use of emotion detection in policing, border management, workplaces, and schooling. Some companies, like Microsoft, have voluntarily withdrawn their emotion-detecting AI solutions in response to public backlash.
Nevertheless, LAION appears prepared to navigate these risks and remains confident in the open development process.
“We encourage researchers to explore, propose modifications, and identify issues,” Kaczmarczyk said. “Similar to how Wikipedia thrives on community engagement, Open Empathic is driven by collective contributions, ensuring transparency and accountability.”
Transparency? Yes. Safety? Only time will tell.