Enhancing Generative AI Reasoning: Google DeepMind Unveils GenRM Technology

Google DeepMind Introduces Generative Evaluator GenRM to Enhance AI Reasoning Abilities

On August 27, 2023, the Google DeepMind team published a paper on arXiv introducing their innovative generative evaluator, GenRM. This new reward model is designed to significantly enhance the reasoning capabilities of generative AI.

Currently, the prevailing method for improving large language models (LLMs) is the "Best-of-N" approach. This technique involves generating N candidate solutions, which are then ranked by an evaluator to determine the best option. However, traditional LLM evaluators typically function only as discriminative classifiers, failing to fully harness the text generation capabilities of pre-trained LLMs.

To overcome this limitation, the DeepMind team has trained the evaluator using the prediction of the next token, integrating both validation and solution generation. GenRM offers several distinct advantages over conventional evaluators:

- Seamless integration of instruction adjustment

- Support for chain-of-thought reasoning

- Calculation of additional reasoning time using majority voting

In tasks involving algorithms and foundational mathematical reasoning, GenRM outperformed both discriminative evaluators and LLM-as-a-Judge evaluators when tested with Gemma-based evaluators, achieving a problem-solving success rate increase of 16% to 64%.

Google DeepMind asserts that GenRM represents a significant evolution in AI reward systems, particularly enhancing capacity to prevent potential fraudulent behaviors in new model training. This advancement underscores the necessity of refining reward models to ensure that AI outputs meet societal responsibility standards.

Most people like

Find AI tools in YBX