HyperAI

Solving Inequality Proofs with Large Language Models

Sheng, Jiayi ; Lyu, Luna ; Jin, Jikai ; Xia, Tony ; Gu, Alex ; Zou, James ; Lu, Pan
Veröffentlichungsdatum: 6/11/2025
Solving Inequality Proofs with Large Language Models
Abstract

Inequality proving, crucial across diverse scientific and mathematicalfields, tests advanced reasoning skills such as discovering tight bounds andstrategic theorem application. This makes it a distinct, demanding frontier forlarge language models (LLMs), offering insights beyond general mathematicalproblem-solving. Progress in this area is hampered by existing datasets thatare often scarce, synthetic, or rigidly formal. We address this by proposing aninformal yet verifiable task formulation, recasting inequality proving into twoautomatically checkable subtasks: bound estimation and relation prediction.Building on this, we release IneqMath, an expert-curated dataset ofOlympiad-level inequalities, including a test set and training corpus enrichedwith step-wise solutions and theorem annotations. We also develop a novelLLM-as-judge evaluation framework, combining a final-answer judge with fourstep-wise judges designed to detect common reasoning flaws. A systematicevaluation of 29 leading LLMs on IneqMath reveals a surprising reality: eventop models like o1 achieve less than 10% overall accuracy under step-wisescrutiny; this is a drop of up to 65.5% from their accuracy considering onlyfinal answer equivalence. This discrepancy exposes fragile deductive chains anda critical gap for current LLMs between merely finding an answer andconstructing a rigorous proof. Scaling model size and increasing test-timecomputation yield limited gains in overall proof correctness. Instead, ourfindings highlight promising research directions such as theorem-guidedreasoning and self-refinement. Code and data are available athttps://ineqmath.github.io/.