Command Palette
Search for a command to run...
AI-CoMathematician: Agentic AI による数学者の加速
AI-CoMathematician: Agentic AI による数学者の加速
概要
我々は、数学者が対話的にAIエージェントを活用して未解決の研究課題に取り組み得るワークベンチである「AI共同数学者(AI co-mathematician)」を紹介します。AI共同数学者は、アイデア創出、文献検索、計算による探求、定理証明、理論構築など、数学的ワークフローが持つ探索的かつ反復的な特性に最適化され包括的な支援を提供するよう設計されています。不確実性を管理し、ユーザーの意図を精密化し、失敗した仮説を追跡し、ネイティブな数学的成果物を出力する非同期かつ状態を保持するワークスペースを提供することで、本システムは人間の協働ワークフローを模倣します。初期テストでは、AI共同数学者は研究者に対し、未解決問題の解決、新たな研究方向の特定、ならびに見落とされていた文献参照の発見を支援しました。AI支援による数学的発見のための高度に対話的なパラダイムを実証するだけでなく、AI共同数学者は難問解決のベンチマークでも最先端の結果を達成しており、特に FrontierMath Tier 4 では48%というスコアを記録し、評価されたすべてのAIシステムにおける最高得点を樹立しました。
One-sentence Summary
The authors propose the AI Co-Mathematician, a stateful agentic workbench that differs from prior tools by holistically supporting mathematical research through uncertainty management and hypothesis tracking during ideation and theorem proving to achieve state-of-the-art results on FrontierMath Tier 4 benchmarks while accelerating open problem solving and uncovering overlooked literature references.
Key Contributions
- The paper introduces the AI co-mathematician, a workbench designed to help mathematicians interactively leverage AI agents for open-ended research. This system provides holistic support for workflows such as ideation, literature search, and theorem proving within an asynchronous environment.
- The system utilizes a stateful workspace that manages uncertainty and tracks failed hypotheses to mirror human collaborative workflows. It grounds outputs in native mathematical artifacts and maintains a living working paper to capture the full research journey.
- Early user tests demonstrate the system helped researchers solve open problems and identify new research directions. The system also achieves state-of-the-art results on hard problem-solving benchmarks, scoring 48% on FrontierMath Tier 4.
Introduction
Mathematical research involves complex, iterative workflows that current AI tools often fail to support holistically. While existing systems excel at isolated problem solving or formal verification, they lack the stateful orchestration needed for long-term exploration and hypothesis management. The authors introduce the AI co-mathematician, a stateful workbench that enables interactive collaboration between humans and agentic AI. This system manages uncertainty and tracks research artifacts while leveraging powerful underlying models to solve open problems and achieve leading results on hard benchmarks.
Method
The AI co-mathematician operates as a hierarchical multi-agent framework designed to mirror professional mathematical workflows. The system avoids the limitations of a standard conversational chatbot by organizing agents into a structured team that supports asynchronous interaction and progressive disclosure. The overall organization of these agents is depicted in the framework diagram, which illustrates the communication pathways between the user, the Project Coordinator, Workstream Coordinators, and Specialized Sub-agents.
The user interacts primarily with a top-level Project Coordinator agent, which serves as the central interface for managing the project's high-level strategy. As shown in the figure below, the interaction begins with an onboarding phase where the user and the Project Coordinator iteratively refine a raw input into a formal Research Question and a set of specific Goals. This process ensures that downstream computational resources are directed toward the mathematician's actual, refined intent rather than a potentially ambiguous initial prompt.
Once the goals are approved, the Project Coordinator delegates work to parallel Workstream Coordinators. This branching capability allows the system to explore multiple avenues of inquiry simultaneously without blocking the user. The progression of this branching is visualized in the next figure, where a single Research Question splits into distinct Goals, each associated with independent Workstreams that evolve over time. This structure enables the system to handle diverse tasks, such as literature reviews and computational framework design, in parallel.
Within each Workstream, a Workstream Coordinator agent orchestrates a linear sequence of actions to achieve its specific goal. These actions may involve delegating tasks to specialized sub-agents, such as those for literature search or code execution. A detailed trajectory of a single workstream is shown in the figure below, highlighting the iterative cycle of performing tasks, updating the project report, and responding to external requests. The workstream concludes by sending the final report for review, where it is scrutinized by AI reviewer agents to ensure rigor before being finalized.
Experiment
The evaluation combined early access trials with professional mathematicians and controlled benchmark testing to assess an interactive AI co-mathematician. Case studies validated the system's utility as a collaborative partner that resolves open problems and accelerates exploration when users actively guide the workflow with domain expertise. Benchmark results further demonstrated that the agentic architecture significantly outperforms base models on complex research tasks by leveraging parallel reasoning and external tools, although challenges remain regarding autonomous review stability and the potential impact on mathematical literature standards.