AI Takes on Challenging Math Olympiad Problems: A New Era in Problem Solving

After its success in defeating human champions in Go and strategic board games, Google DeepMind's artificial intelligence (AI) system achieved a significant milestone at the 2024 International Mathematical Olympiad (IMO) held in Bath, UK. There, it narrowly missed gold, earning a silver medal by just one point—marking the first time an AI contestant has reached the podium at the IMO. Reporting on July 27, Nature noted that DeepMind is in a competitive race with other companies to tackle complex mathematical challenges. The IMO has increasingly been seen as a benchmark for AI, testing its advanced mathematical reasoning capabilities. The AI's performance this year signifies its potential to outpace leading students in solving mathematical problems.

DeepMind designed a specialized AI system to tackle IMO questions, successfully solving four out of six problems to achieve a score of 28 (out of a possible 42), a performance level equivalent to that of the silver medalists. The system comprises upgraded models, including AlphaProof for mathematical reasoning and AlphaGeometry 2 for geometry problems. AlphaGeometry 2 resolved a geometric question, while AlphaProof tackled two algebraic problems and one number theory question. Earlier in January, AlphaGeometry demonstrated medal-worthy prowess in addressing Euclidean geometry issues. Before the IMO competition, AlphaGeometry 2 had managed to solve 83% of geometry problems from the past 25 years, a significant increase from the 53% solved by its predecessor.

DeepMind’s Vice President of AI Science, Pushmeet Kohli, highlighted that this achievement represents the first time an AI system reached the performance standards necessary for an IMO medal. IMO President Gregor Dolinar remarked on the rapid advancements in AI, suggesting that it will eventually outperform humans in solving most mathematical problems.

In a related development, scientists at the software company Numina leveraged language models to receive the inaugural "Progress Award" at the AI Mathematical Olympiad (AIMO). However, the Numina team conceded that to tackle more challenging mathematical questions, language models alone may not suffice. AlphaProof operates as a self-learning system, innovatively merging pre-trained language models with the AlphaZero reinforcement learning algorithm. This approach allows the system to find solutions through repeated attempts.

To tackle the language challenge posed by the majority of IMO problems being in English, the DeepMind team, led by Thomas Herbert, translated these problems into a programming language called Lean using Google’s large language model, Gemini. AlphaProof employs a fine-tuned version of the Gemini model to automatically convert mathematical questions into Lean language, building an extensive problem database of varying difficulty levels. During the reinforcement learning phase, each validated proof aids in enhancing AlphaProof’s language model, subsequently improving its capacity to solve increasingly complex problems. Herbert noted that a similar strategy was used during their training for Go, where the AI improved through self-competition.

Despite AlphaProof's impressive capabilities, it demonstrated slower performance, taking three days to solve three problems compared to four and a half hours for human competitors. Additionally, it struggled with two combinatorial mathematics questions. British mathematician Joseph Myers reviewed the AI's answers at the IMO, questioning the robustness of the techniques applied by AlphaProof. He Yanghui from the London Mathematical Sciences Institute noted that while systems like AlphaProof are beneficial for mathematicians tackling proofs, they don’t assist researchers in identifying which problems to pursue.

The DeepMind team indicated their ongoing exploration of various AI methodologies to advance mathematical reasoning. Looking ahead, they anticipate collaboration between mathematicians and AI to test hypotheses and explore new methods for resolving long-standing mathematical challenges, aiming for AlphaProof to help refine Google’s large language models by reducing erroneous responses.

Most people like

Find AI tools in YBX

Related Articles
Refresh Articles