Google DeepMind: Enhancing AI Performance through Greater Human Connection

Just as people thrive on positive reinforcement, AI can also benefit from advice that mimics human interaction. Researchers at Google DeepMind have introduced a transformative approach that significantly enhances the mathematical capabilities of language models through prompts that emulate everyday human communication. This innovative method, detailed in their paper "Large Language Models as Optimizers," is known as Optimization by PROmpting (OPRO).

OPRO leverages natural language to guide large language models, such as OpenAI's ChatGPT, in tackling complex problems. Traditional machine learning relies on formal mathematical processes to boost performance. In contrast, OPRO initiates improvement through relatable, conversational language. By interpreting the description of a problem along with previous responses, the language model generates potential solutions.

Tinglong Dai, a professor of Operations Management and Business Analytics at Johns Hopkins University, explains, “LLMs are trained on human-generated content, and the way it works, roughly speaking, is to finish your sentences the way a good couple would. So it's not surprising that human-like prompts lead to good results.” This highlights how the phrasing of prompts can significantly impact AI outcomes.

The DeepMind study revealed that certain phrases notably influenced the models’ performances. For instance, prompts such as "let's think step by step" led to enhanced accuracy in solving mathematical problems when tested against datasets. The phrase "Take a deep breath and work on this problem step by step" yielded the best results with Google's PaLM 2, achieving an accuracy score of 80.2% when assessed against GSM8K, a dataset of grade-school math word problems. For comparison, PaLM 2, without any specific prompting, only achieved 34%, while the classic prompt "Let’s think step by step" reached 71.8%.

Michael Kearns, a professor of Computer and Information Science at the University of Pennsylvania, notes that LLMs excel at modifying their responses based on human-like prompts due to their training on conversational data, including sources like Reddit posts and movie scripts. He emphasizes the importance of encouraging LLMs to dissect math or logic problems into manageable steps, backed by training on data that includes mathematical proofs and formal reasoning.

Chengrun Yang, a co-author of the DeepMind paper, explains that most LLMs have been trained with vast datasets, equipping them with robust capabilities in natural language processing, including paraphrasing and sentence enrichment. Continuous efforts in model alignment also improve LLMs’ ability to understand and respond to human-like prompts effectively.

According to Olga Beregovaya, vice president of AI and Machine Translation at Smartling, human-like prompts often take the form of requests that guide the AI into a more dialog-oriented interaction. “LLMs perform best when given more context,” she adds. Verbose prompts with additional details enable the model to align its responses more closely with the specific context presented.

Interestingly, simple words of encouragement can enhance AI performance as well. Dai points out that LLMs may yield better results when users motivate them, such as saying, “Come on, you can do better than that!” Notably, examples such as asking LLMs to role-play as a Nobel Prize-winning economist can lead to more insightful discussions about complex topics like inflation. Similarly, in medical diagnosis scenarios, prompting LLMs to adopt the persona of a leading medical expert may produce more accurate and focused results. However, he notes that while these human-style encouragements can be effective, they do not guarantee universal improvements across all tasks.

Importantly, there's also potential for LLMs to respond well to non-human prompts tailored to specific tasks. Dai mentions that structured, coded prompts can yield effective results, providing a contrast to traditional conversational approaches.

The OPRO method could simplify the process of engineering AI prompts, allowing users to optimize their queries based on various metrics such as problem-solving accuracy in math, tool trigger rates, and the creativity of text generation. Yang expresses hope that this method will inspire novel applications for employing LLMs to enhance a broader range of tasks, paving the way for more interactive and efficient AI solutions.

Most people like

Find AI tools in YBX

Related Articles
Refresh Articles