HyperAI

In July of last year, Google DeepMind announced that its AI system achieved a silver medal at the International Mathematical Olympiad (IMO), marking the first time an artificial intelligence system reached medalist level in this prestigious competition. Now, after fulfilling its promise to disclose technical details, the full research paper has been published in Nature on November 12, revealing the inner workings of AlphaProof. The IMO is widely regarded as one of the most challenging mathematics competitions in the world, attracting the brightest young mathematical minds globally. Each year, contestants solve six extremely difficult problems spanning algebra, combinatorics, number theory, and geometry. In the 2024 competition, only a fraction of participants—less than 1%—earned full marks on a 42-point exam. Many renowned mathematicians, including Fields Medalists, were once IMO contestants, underscoring the event’s significance in the mathematical community. Over recent years, the IMO has emerged as a benchmark for assessing AI systems’ advanced mathematical reasoning capabilities. In the 2024 contest, AlphaProof, working alongside AlphaGeometry 2—which specializes in geometric problems—successfully solved four out of six problems, scoring 28 points. This placed it among the top-tier performers, achieving the level of a silver medalist. Most notably, AlphaProof solved Problem 6—the most difficult problem of the year—only five human contestants managed to solve it completely. This milestone marks a major breakthrough in machine mathematical reasoning. At its core, AlphaProof combines formal mathematics with reinforcement learning to overcome a critical flaw in conventional large language models: the tendency to produce plausible-sounding but incorrect reasoning, known as “hallucinations.” Mathematical proofs demand absolute rigor, leaving no room for error. To ensure correctness, the system operates within Lean, a powerful interactive theorem prover that automatically verifies every logical step. In Lean, every proof must conform to strict formal rules, and any mistake is immediately flagged—eliminating the risk of undetected errors. However, a major challenge with formal systems like Lean is the scarcity of training data. The standard math library, mathlib, contains only about 200,000 theorems—far too few for large-scale machine learning. To address this, the DeepMind team first fine-tuned the Gemini language model to automatically translate natural language math problems into Lean-compatible formal statements—a process known as “automatic formalization.” Using this method, they generated 80 million formalized mathematical statements from one million natural language problems, creating a vast synthetic dataset. AlphaProof’s architecture draws inspiration from AlphaZero. It features a 3-billion-parameter encoder-decoder transformer model capable of interpreting the current state of a Lean proof and generating two key outputs: a policy network that suggests the next logical step, and a value function that estimates the likelihood of success along a given proof path. These outputs guide a custom tree search algorithm, navigating the vast space of possible proof strategies. A key innovation is the use of “product nodes” in the search tree. In Lean, proving a theorem often involves breaking it into independent subgoals—such as proving both the base case and inductive step in mathematical induction. Product nodes require all child branches to be proven, enabling the system to track progress and focus computational resources on the most challenging parts. The most groundbreaking technique, however, is “Test-Time Reinforcement Learning” (Test-Time RL). For particularly difficult problems, AlphaProof dynamically allocates substantial computational resources during inference. It generates numerous variations of the target theorem—ranging from simpler to harder—effectively creating a self-guided learning curriculum. By solving easier variants first, the system gradually builds intuition and insight, ultimately tackling the original complex problem. This strategy proved decisive in solving the notoriously difficult IMO 2024 Problem 6. During training, AlphaProof learns efficiently: it avoids wasting effort on proven false statements and actively seeks shorter, more elegant proofs once a solution is found. Early training focuses on small-scale searches to prevent inefficiency, gradually scaling up as the system improves. In practice, AlphaProof behaves differently from human contestants. While humans have two 4.5-hour sessions to solve problems, AlphaProof can solve some in minutes—but others take up to three days. The research team emphasizes that speed is not the goal; rather, it’s about demonstrating reasoning capability at the Olympiad level. Over time, performance improvements will narrow this gap. Timothy Gowers, a Fields Medalist and former IMO gold medalist, praised the results: “As a mathematician, I find this deeply impressive—a significant leap forward.” Katie Collins, a mathematician and AI expert at the University of Cambridge, noted the broader impact: “If mathematical results can be expressed in such a system, we gain greater confidence in their correctness and can foster more collaboration.” Notably, DeepMind also introduced Gemini Deep Think this year, a new system that performs end-to-end reasoning in natural language, achieving a gold medal score of 35 points within the 4.5-hour time limit—without relying on formalization. This shows a clear evolution: from formalized, verifiable proofs to direct natural language reasoning. AlphaProof’s path through formalization and Gemini Deep Think’s natural language approach may eventually converge, leading to a new generation of more powerful, general, and trustworthy mathematical AI tools. Julian Schrittwieser, one of the paper’s authors, highlighted the scalability of Test-Time RL: “The only limit is how many TPU resources we can throw at it.” This approach—using heavy computation at inference time—could open new frontiers in AI optimization. More importantly, AlphaProof demonstrates that combining reinforcement learning with search is effective for large language models, offering a blueprint for building more reliable, verifiable AI systems. However, challenges remain: all IMO problems still require manual translation into formal language. While the team has experimented with natural language models, achieving full automation in translating problems into formal logic remains a key hurdle. Additionally, AlphaProof excels in algebra and number theory but struggles with combinatorics—research is ongoing to understand why. The work represents a transformative step in AI’s ability to reason mathematically, pushing the boundaries of what machines can achieve in the realm of human-level problem solving.

Related Links

Related Links

Related Links

Paper Roundup | Latest Advances in Large-Scale Reinforcement Learning: Microsoft, Google, Stanford, Renmin University, Xiaohongshu, and Others Release Major Achievements in Credit Allocation, Complex Reasoning, and Agent Reinforcement Learning

Paper Roundup | Latest Advances in Large-Scale Reinforcement Learning: Microsoft, Google, Stanford, Renmin University, Xiaohongshu, and Others Release Major Achievements in Credit Allocation, Complex Reasoning, and Agent Reinforcement Learning

Command Palette

AlphaProof: AI Earns Silver Medal at IMO Using Formal Mathematical Reasoning

Related Links

Command Palette

AlphaProof: AI Earns Silver Medal at IMO Using Formal Mathematical Reasoning

Related Links

Command Palette

AlphaProof: AI Earns Silver Medal at IMO Using Formal Mathematical Reasoning

Related Links

Paper Roundup | Latest Advances in Large-Scale Reinforcement Learning: Microsoft, Google, Stanford, Renmin University, Xiaohongshu, and Others Release Major Achievements in Credit Allocation, Complex Reasoning, and Agent Reinforcement Learning

Paper Roundup | Latest Advances in Large-Scale Reinforcement Learning: Microsoft, Google, Stanford, Renmin University, Xiaohongshu, and Others Release Major Achievements in Credit Allocation, Complex Reasoning, and Agent Reinforcement Learning