Google Announces Fix for Gemini's People-Generating Feature

In February, Google halted its AI chatbot Gemini's ability to generate images of people due to user complaints about historical inaccuracies. For instance, when asked to depict a “Roman legion,” Gemini portrayed an anachronistic group of racially diverse soldiers, while “Zulu warriors” were rendered in stereotypical Black imagery. In response, Google CEO Sundar Pichai issued an apology, and Demis Hassabis, co-founder of Google’s AI research division DeepMind, promised a fix would come “in very short order” — ideally within weeks. However, the resolution took significantly longer, even as some Googlers clocked 120-hour workweeks. Fortunately, Gemini will soon regain the capability to create images featuring people.

However, this feature will initially be limited. Only users subscribed to one of Google's paid Gemini plans—Gemini Advanced, Business, or Enterprise—will have access to Gemini's people-generating function in a trial that currently supports English only. Google has not disclosed when this feature will be available for free users or in other languages.

“Gemini Advanced gives our users priority access to our latest features,” explained a Google spokesperson. “This helps us gather valuable feedback while rolling out a highly anticipated feature first to our premium subscribers.”

What improvements has Google made for people generation? According to the company, the latest image-generating model in Gemini, Imagen 3, includes safeguards designed to create images of people that are more “fair.” Imagen 3 was trained on AI-generated captions to “enhance the variety and diversity of concepts associated with the images in [its] training data,” as detailed in a technical paper. Moreover, the training data was rigorously filtered for “safety” and underwent thorough review for fairness issues, according to Google.

While we sought more information about the specifics of Imagen 3’s training data, the spokesperson stated only that it was “based on a large dataset comprising images, text, and associated annotations.” “We’ve significantly reduced the potential for undesirable responses through extensive internal and external red-teaming testing, collaborating with independent experts for continued improvements,” they added, emphasizing a commitment to testing people generation thoroughly before reactivating it.

In brighter news, all Gemini users will receive access to Imagen 3 within a week—though people generation will remain exclusive to premium subscribers. Google asserts that Imagen 3 better comprehends text prompts for image creation than its predecessor, Imagen 2, offering increased creativity and detail in its outputs. The new model also promises fewer artifacts and errors, positioning it as the most advanced Imagen model for text rendering to date.

To address concerns related to deepfakes, Imagen 3 will implement SynthID, a method developed by DeepMind that employs invisible, cryptographic watermarks on AI-generated media. This announcement aligns with Google's earlier commitment concerning Imagen 3's use of SynthID. However, it’s noteworthy how Google approaches image generation in Gemini compared to its other offerings, such as Pixel Studio.

Additionally, Google is introducing Gems for Gemini, available exclusively to Gemini Advanced, Business, and Enterprise users. Much like OpenAI’s GPTs, Gems are custom-tailored iterations of Gemini that can serve as “experts” on specific topics, such as vegetarian cooking.

Google describes Gems in a blog post: “With Gems, you can create a team of experts to help tackle challenging projects, brainstorm ideas for events, or craft the perfect social media caption. Your Gem can also retain a detailed set of instructions, saving you time on tedious, repetitive tasks.”

Users can create a Gem by writing instructions, naming it, and getting started. Gems are accessible on desktop and mobile across 150 countries and “most languages,” although the feature isn’t yet available in Gemini Live. A variety of examples, including a “learning coach,” “career guide,” “brainstormer,” and “coding partner,” will be available at launch.

We inquired whether Google plans to enable users to publish and share their Gems, akin to OpenAI’s GPT Store. The response was a firm “no.” “Currently, we’re focused on understanding how people will use Gems for creativity and productivity,” the spokesperson clarified. “There’s nothing further to share at this time.”

Most people like

Find AI tools in YBX