If you ask Gemini, Google’s leading GenAI model, to generate misleading content about the upcoming U.S. presidential election, it can do so with the right prompt. Inquire about a future Super Bowl game, and it will create an invented play-by-play. If you ask about the Titan submersible implosion, it can produce disinformation, complete with convincingly fabricated citations.
This situation paints a troubling picture for Google and has caught the attention of policymakers, who are increasingly concerned about how GenAI tools can be misused for spreading disinformation and causing general confusion.
In response to these concerns, Google, which recently downsized its workforce, is directing investments toward AI safety. This morning, Google DeepMind—the AI research and development division behind Gemini and many of Google's recent GenAI initiatives—announced the establishment of a new organization: AI Safety and Alignment. This organization combines existing teams focused on AI safety with newly formed groups of specialized GenAI researchers and engineers.
While Google has not disclosed the number of new hires expected from this initiative, they did mention that AI Safety and Alignment would include a team dedicated to ensuring the safety of artificial general intelligence (AGI)—theoretical systems capable of performing any human task.
Similar to OpenAI’s Superalignment division established last July, this new team will collaborate with DeepMind’s existing AI-safety research team located in London, Scalable Alignment, which is addressing the complex issue of managing yet-to-be-formed superintelligent AI.
Why create two teams focused on the same challenge? That’s a valid question, especially given Google's reluctance to share details. However, the new AI Safety and Alignment team is based in the U.S., close to Google’s headquarters, as the company strives to keep pace with AI competitors while projecting a responsible approach to AI development.
The various teams within AI Safety and Alignment will work to implement concrete safety measures in both current and upcoming Gemini models. Their immediate priorities include preventing the dissemination of harmful medical advice, ensuring child safety, and mitigating the amplification of bias and injustice. Anca Dragan, formerly a Waymo research scientist and a UC Berkeley professor of computer science, will lead the team.
“Our goal at the AI Safety and Alignment organization is to enhance models' ability to understand human preferences and values robustly,” Dragan shared via email. “We want them to recognize their limitations, engage with users to comprehend their needs, promote informed oversight, and become resilient against adversarial attacks while acknowledging the diversity and fluid nature of human values.”
Dragan’s ties to Waymo—especially considering the challenges faced by Google's autonomous driving project—might raise eyebrows. Additionally, her decision to balance responsibilities between DeepMind and UC Berkeley, where she directs a lab focused on human-AI and human-robot interaction algorithms, could also surprise some. One might assume that the urgent issues surrounding AGI safety and other serious risks the AI Safety and Alignment organization intends to explore—such as preventing AI from “facilitating terrorism” or “destabilizing society”—demand full-time leadership.
Despite these concerns, Dragan argues that the research at her UC Berkeley lab and DeepMind are interrelated and mutually beneficial. "My lab and I are focused on value alignment in anticipation of advanced AI capabilities," she explained. "My own Ph.D. research revolved around robots inferring human goals and being transparent about their own intentions, which sparked my interest in this field. I'm excited to bring this experience to DeepMind, emphasizing that addressing present-day challenges and potential risks are not mutually exclusive endeavors."
Dragan certainly has her work cut out for her. Public skepticism towards GenAI tools is at an all-time high, particularly regarding issues like deepfakes and misinformation. A recent YouGov poll revealed that 85% of Americans expressed concern about the spread of misleading audio and video deepfakes. Additionally, a survey from The Associated Press-NORC Center for Public Affairs Research indicated that nearly 60% of adults believe AI tools will exacerbate the spread of false and misleading information during the 2024 U.S. electoral cycle.
Businesses, which Google and its competitors are eager to attract with GenAI innovations, are also wary of potential pitfalls. A survey by Intel subsidiary Cnvrg.io showed that about 25% of companies piloting or deploying GenAI applications have concerns regarding compliance, privacy, reliability, implementation costs, and the skills required to effectively utilize these tools.
Furthermore, a poll conducted by Riskonnect, a risk management software provider, found that over half of executives expressed worries about employees making decisions based on inaccurate GenAI-generated information.
These concerns are not unfounded. A recent report from The Wall Street Journal revealed that Microsoft’s Copilot suite, built on similar GenAI models as Gemini, frequently makes errors in meeting summaries and spreadsheet calculations. The underlying issue is termed "hallucination," which refers to GenAI’s tendency to fabricate information. Many experts believe this issue may never be fully resolved.
Acknowledging the complexities of AI safety, Dragan makes no promises of a flawless model. Instead, she emphasizes DeepMind's commitment to investing more resources into this domain and establishing a framework for evaluating GenAI model safety risks "soon."
"The crucial steps involve addressing lingering human cognitive biases in the training data, providing accurate uncertainty estimates, implementing real-time monitoring to detect failures, and ensuring dialogue on important decisions while tracking where models may engage in potentially harmful behaviors," Dragan explained. "Yet, this still leaves us with the unresolved problem of confidently predicting when a model might misbehave—a challenge that may surface during actual deployment."
It's uncertain whether customers, the public, and regulators will be understanding in light of these challenges. Ultimately, how egregious the misbehaviors are and who is affected will likely play a significant role in shaping perceptions.
"As our users engage with our models, we hope they will find them increasingly helpful and safe over time," Dragan stated, expressing optimism for the future.