Emerging large language models (LLMs) like OpenAI’s ChatGPT (particularly GPT-4), Claude AI, and Gemini have shown limited decision-making capabilities. This article explores recent research on LLM decision-making and its implications for their future.
Traditionally, effective decision-making in LLMs involves recognizing underlying patterns or rules and applying them flexibly to new scenarios. A study by the Santa Fe Institute found that LLMs, including ChatGPT, struggle to "reason about basic core concepts." Making sound decisions requires a deep understanding of the prompt's context and the potential consequences of the output.
Poor decision-making by LLMs can lead to harmful outcomes. For instance, in 2023, the National Eating Disorder Association suspended its AI chatbot "Tessa" after it began providing harmful advice, such as suggesting weekly weigh-ins and a calorie deficit of 500 to 1,000 calories. The backlash prompted the chatbot's swift deactivation.
LLMs also tend to generate generic recommendations. Research from INSEAD revealed that when prompted for business strategy questions, ChatGPT often resorted to conventional wisdom like promoting collaborative work and a culture of innovation. However, business strategy is a complex process that requires tailored insights rather than generic advice.
A potential counterargument is that training LLMs specifically for business strategies or healthcare advice could resolve these issues. However, improving their contextual understanding cannot solely be addressed by broadening their datasets. Simply adding more data may introduce biases and increase computational demand without enhancing decision-making quality.
Enabling Context-Appropriate Decision-Making
Training LLMs for context-appropriate decision-making requires a nuanced approach. Two advanced strategies from current machine learning research propose ways to enhance LLM decision-making to resemble human cognitive processes. The first, AutoGPT, employs a self-reflexive mechanism to plan and validate outputs. The second, Tree of Thoughts (ToT), encourages effective decision-making by breaking away from traditional linear reasoning.
AutoGPT is designed to autonomously create, assess, and refine models to achieve specific objectives. Enhancements to AutoGPT now incorporate an "additional opinions" strategy, integrating expert models into the decision-making process. This integration allows LLMs to utilize relevant information from various expert analyses, thus improving decision outcomes through a systematic "thought-reasoning-plan-criticism" approach.
If effectively implemented, LLMs augmented with expert models could process more information than humans, suggesting they may make more informed decisions. However, a limitation of AutoGPT is its constrained context window, which can lead to infinite interaction loops. Providing all relevant information upfront often yields better outcomes compared to gradually injecting data throughout a conversation.
Simulating Human Cognition with Tree of Thoughts
The Tree of Thoughts (ToT) framework offers another promising method to enhance LLM accuracy by mimicking human cognitive processes. Human decision-making often involves generating and evaluating multiple scenarios. ToT identifies linear reasoning faults in LLMs, similar to AutoGPT's approach. In experiments, ToT measures LLMs' abilities to follow natural language instructions in completing tasks like puzzles and creative writing.
Traditional linear reasoning in LLMs is represented by "Chain of Thought," which delineates a sequential decision-making process. However, ToT seeks to enhance LLMs' self-critical abilities and explore various reasoning pathways. For example, in the Game of 24, Chain of Thought struggled to identify different mathematical operations to reach 24, resulting in a low accuracy rate. ToT's capability to evaluate multiple outcomes led to a 74% accuracy rate in the same task.
If LLMs can consistently improve their judgment, future collaborations between humans and AI on strategic decision-making could become a reality. ToT applications extend to coding, data analysis, and robotics, while AutoGPT aspires toward general intelligence.
As academic research evolves, innovative strategies to enhance cognitive decision-making in LLMs are emerging. Given their inherent ability to analyze vast amounts of data efficiently, successful advancements could enable LLMs to match or even exceed human decision-making capabilities within the next few years.
Vincent Polfliet is a Senior Machine Learning Engineer at Evolution AI. Miranda Hartley leads Copywriting at Evolution AI.