Meta Researchers Enhance LLMs with System 2 Thinking, Boosting Performance in Complex Reasoning Tasks

Large language models (LLMs) excel at answering simple questions but struggle with complex tasks that require reasoning and planning. To enhance their reasoning abilities, researchers have developed specialized prompting techniques, often referred to as “System 2” techniques, which encourage LLMs to produce intermediate steps as they tackle a problem.

While effective, these System 2 techniques can slow down applications and increase computational costs. A recent paper from researchers at Meta FAIR introduces “System 2 distillation,” a novel approach that enables LLMs to perform complex tasks without relying on intermediate reasoning steps.

In cognitive science, "System 1" and "System 2" describe two types of thinking. System 1 is fast, intuitive, and automatic, used for quick judgments and familiar symbol recognition, such as identifying traffic signs and recognizing faces. In contrast, System 2 is slow, deliberate, and analytical, required for complex problem-solving like solving equations or planning.

LLMs are typically aligned with System 1 thinking, generating text rapidly but lacking the ability to engage in deliberate reasoning. Researchers have demonstrated that by prompting LLMs to outline their reasoning before presenting answers, they can simulate System 2 thinking. Techniques like "Chain of Thought" prompt the model to explain its reasoning step-by-step, leading to more accurate outcomes in logical tasks.

“While these methods yield more accurate results thanks to explicit reasoning, they often come with increased inference costs and longer response times,” note the researchers from Meta AI. “As a result, many organizations stick with faster, more efficient System 1 approaches.”

System 2 Distillation

An intriguing aspect of human System 2 reasoning is that with repeated practice, these deliberate tasks often transition to System 1 thinking and become automatic. For example, as someone learns to drive, they initially focus intently on each step, but over time, driving becomes instinctive.

Inspired by this phenomenon, the researchers developed “System 2 distillation.” This method follows a common machine learning technique where a larger model, the "teacher," helps train a smaller "student" model. However, in this case, there is no separate teacher. Instead, knowledge from the model's System 2 reasoning is transferred into its faster, more efficient System 1 generation.

The process begins by using System 2 prompting techniques to solve a problem. The model's responses are then refined through an unsupervised verification method, like “self-consistency,” where it generates answers multiple times. The most frequently occurring answer is selected for the distillation dataset, while inconsistent answers are discarded.

Next, the intermediate reasoning steps are omitted, retaining only the final answers. The model is then fine-tuned on these questions and answers to enable direct responses without intermediary processing.

Evaluating System 2 Distillation

The researchers tested this method on various reasoning tasks, employing four different System 2 prompting techniques with the Llama-2-70B model, which is sufficiently capable of internalizing new knowledge.

Techniques explored included Chain of Thought, System 2 Attention, Rephrase and Respond, and Branch-Solve-Merge. Some methods require multiple prompts, leading to slower and more resource-intensive responses. For instance, Rephrase and Respond involves first rephrasing the question and then submitting the rephrased prompt, while Branch-Solve-Merge entails complex interactions with the model.

Results indicate that System 2 distillation significantly enhances LLM performance on challenging reasoning tasks, often matching or surpassing the accuracy of traditional System 2 methods. Notably, distilled models can deliver responses more swiftly and with less computing power by bypassing intermediary reasoning steps.

For example, the distillation method proved beneficial for tasks using System 2 Attention to address biases or irrelevant information. It also yielded impressive outcomes in reasoning tasks where Rephrase and Respond clarified responses, as well as in the nuanced evaluations needed for tasks processed through Branch-Solve-Merge.

The researchers concluded: “Our findings suggest that, in many instances, it is possible to distill System 2 reasoning into LLM outputs without intermediate generations, while either maintaining or enhancing performance.”

However, just as with humans, LLMs cannot distill all reasoning skills into their rapid inference mechanisms. For instance, distillation of complex math reasoning tasks requiring Chain of Thought prompting remained ineffective, indicating that certain tasks might always necessitate deliberate reasoning.

Further research is essential to understand the limitations of System 2 distillation, such as its effectiveness with smaller models and potential impacts on broader performance across tasks not included in the training dataset. Additionally, LLM benchmarks often face contamination issues where models may exhibit familiarity with test examples, inflating results.

Nevertheless, System 2 distillation presents a promising optimization strategy for advanced LLM applications focused on specific tasks. “In the future, models capable of distilling useful tasks can allocate more time to reasoning about challenges they haven't mastered yet, much like humans do,” the researchers conclude.

Most people like

Find AI tools in YBX

Related Articles
Refresh Articles