Google has issued an apology—though it stops just short of full contrition—over a recent AI mishap involving its image-generating model. This model, designed to promote diversity in imagery, blatantly disregarded historical accuracy. While the challenge of addressing biases in AI is a valid concern, Google attributes the error to the model's alleged "oversensitivity." However, it's essential to recall that the model didn't create itself.
The AI at the center of this controversy is Gemini, Google's flagship conversational AI platform, which utilizes the Imagen 2 model to generate images on demand when prompted.
Users recently discovered that asking Gemini to produce images of particular historical figures or events yielded results that were unintentionally absurd. For example, requests for images of the Founding Fathers—a group recognized as white slave owners—resulted in a diverse, multi-ethnic depiction that doesn't reflect historical reality.
This easily replicable issue drew considerable mockery on social media and became a focal point in the ongoing discussions surrounding diversity, equity, and inclusion (DEI), a topic currently feeling the heat of criticism. Pundits have seized upon this blunder as further proof of the tech industry's perceived leftist agenda. Meanwhile, some from the left also expressed discomfort at such a distorted representation of history.
As anyone familiar with AI technology would recognize, and as Google somewhat sheepishly explains in its recent post, this fiasco stemmed from an attempt to mitigate systemic bias in training data. Imagine wanting to use Gemini to craft a marketing campaign. If you request "10 pictures of a person walking a dog in a park" without specifying attributes like the person's ethnicity, breed of dog, or park type, the AI defaults to what it knows best—often revealing biases embedded in its training datasets.
Typically, the training data contains an overrepresentation of white individuals in stock images and royalty-free photography, leading the model to default to this group if left unspecified. This bias is an artifact of the data itself, but as Google acknowledges, “our users are globally diverse, and we want the model to reflect that.” Users might expect a request for sports players or pet owners to yield a variety of representations rather than only one ethnicity.
Consider a scenario where you request imagery of dog walkers—but the outcome shows just one ethnicity repeatedly. That wouldn't be appropriate, especially in a different cultural context, such as Morocco, where the people, pets, and parks greatly vary. If no specific attributes are indicated, the model should produce varied representations rather than homogeneous outcomes, regardless of its training biases.
This challenge is prevalent across many forms of generative media, and resolving it is complex. In particularly sensitive or common scenarios, AI companies like Google, OpenAI, and Anthropic discreetly embed additional instructions within their models.
It's crucial to underscore the prevalence of such implicit guidance. The entire large language model (LLM) landscape operates on built-in instructions—commonly referred to as system prompts—that govern behavior, such as avoiding offensive content. For instance, a joke prompt typically avoids any offensive quips since models are trained to steer clear of them, promoting a culture of decorum.
The recent error with Google's Gemini stemmed from a lack of those implicit instructions when historical context is vital. A prompt like “a person walking a dog in a park” benefits from silent inclusions aimed at enhancing diversity. However, “the U.S. Founding Fathers signing the Constitution” clearly does not lend itself to that kind of interpretation.
As Google senior vice president Prabhakar Raghavan noted, the tuning aimed at ensuring Gemini displayed diverse representations overlooked certain cases that should reflect historical accuracy. Moreover, the model has become overly cautious, misinterpreting some benign prompts as sensitive and leading to both overcompensation and conservative outcomes that produced erroneous and embarrassing images.
While I empathize with Raghavan for his reluctance to fully apologize, it's telling that he mentioned the model "became" overly cautious. However, a model doesn't develop traits independently—it's constructed, tested, and iterated upon by thousands of Google engineers. The implicit instructions that enhance some responses need examining when results like this occur.
Google's positioning of the model as having “become” something unintended is misleading—they crafted the model! It’s akin to mistakenly attributing a glass breaking to it simply “falling.” Mistakes by AI are inevitable; they can hallucinate, reflect biases, and behave unpredictably. Nevertheless, the accountability for these errors lies with the developers behind the technology. This time, it's Google; tomorrow, it could be OpenAI or another tech entity.
These companies often prefer to suggest that their AI systems are independently making errors. It’s essential to recognize who truly bears responsibility.