HyperAIHyperAI

Command Palette

Search for a command to run...

지미를 활용한 반자율 수학 탐구: 에르되시 문제에 대한 사례 연구

초록

우리는 블룸의 에르되시 문제 데이터베이스에서 '열린 문제'(Open)로 표기된 700개의 추측을 체계적으로 평가하는 방식으로, 반자율 수학 발견의 사례 연구를 제시한다. 본 연구는 하이브리드적 접근법을 활용한다. 먼저 인공지능(AI) 기반의 자연어 검증을 통해 탐색 공간을 좁히고, 이후 인간 전문가의 평가를 통해 추측의 정확성과 새로운 가치를 판단한다. 본 연구에서는 데이터베이스에서 '열린 문제'로 분류된 13개의 문제를 다루었으며, 그 중 5개는 새로운 자율적 해결 방식을 통해 해결하였고, 나머지 8개는 기존 문헌에서 이미 해결된 사례를 식별함으로써 해결하였다. 연구 결과에 따르면, 이러한 문제들이 '열린 문제'로 남아 있었던 이유는 난이도보다는 정보의 부족과 미지의 상태에 기인한 것으로 나타났다. 또한, 대규모 수학 추측에 AI를 적용함에 있어 발생하는 문제들을 식별하고 논의하며, 문헌 식별의 어려움과 AI가 무의식적으로 기존 연구를 재사용하는 '무의식적 표절( subconscious plagiarism)'의 위험성을 강조한다. 마지막으로, 에르되시 문제에 대한 AI 지원 연구를 통해 얻은 교훈들을 성찰한다.

One-sentence Summary

Researchers from Google and collaborators propose using Gemini to evaluate 700 open math conjectures, resolving 13 via novel AI solutions or rediscovered literature, revealing that “open” status often reflects obscurity, not difficulty, while cautioning against AI’s risk of subconscious plagiarism in large-scale mathematical discovery.

Key Contributions

  • We applied Gemini to evaluate 700 'Open' conjectures from Bloom’s Erdős Problems database using a hybrid AI-human workflow, narrowing the search via natural language verification before expert validation, and resolved 13 problems—5 with novel autonomous solutions and 8 by rediscovering prior literature.
  • Our results indicate that the 'Open' status of these problems often stems from obscurity rather than intrinsic difficulty, as AI efficiently surfaced overlooked or trivially resolved conjectures, challenging assumptions about problem hardness in the database.
  • We document key challenges in scaling AI for mathematical discovery, including the difficulty of comprehensive literature retrieval and the risk of AI inadvertently reproducing known results without attribution, which we term “subconscious plagiarism.”

Introduction

The authors leverage Gemini to evaluate 700 “Open” conjectures from Bloom’s Erdős Problems database, using AI-driven natural language verification to filter candidates before human expert review. This approach matters because it scales AI-assisted discovery in math, where expert evaluation is bottlenecked by time and scarcity. Prior work faced challenges in reliably verifying correctness at scale and identifying whether solutions already existed in the literature — often leading to redundant or misleading claims. The authors’ main contribution is resolving 13 problems: 5 with seemingly novel autonomous solutions and 8 by uncovering prior literature, revealing that many “open” problems were simply obscure, not hard. They also highlight systemic issues like AI’s risk of subconscious plagiarism and the difficulty of literature retrieval — problems formal verification cannot solve.

Top Figure
Top Figure

Method

The authors leverage a geometric incidence framework to establish a lower bound on the growth rate of αk\alpha_kαk, defined as the minimal number of distinct distances determined by any point in a set of nnn points in the plane. The core of the method involves constructing a family of circles derived from a carefully selected subset of kkk points and then applying a known incidence bound to derive a contradiction unless αk\alpha_kαk grows at least as fast as k1/4k^{1/4}k1/4.

The construction begins by selecting a set Pn={x1,,xn}R2P_n = \{x_1, \ldots, x_n\} \subset \mathbb{R}^2Pn={x1,,xn}R2 ordered such that R(xi)R(x_i)R(xi), the number of distinct distances from xix_ixi to other points in PnP_nPn, is non-decreasing. The first kkk points, S={x1,,xk}S = \{x_1, \ldots, x_k\}S={x1,,xk}, are used to define a family of circles C=i=1k{Γ(xi,r):rDi}\mathcal{C} = \bigcup_{i=1}^{k} \{ \Gamma(x_i, r) : r \in D_i \}C=i=1k{Γ(xi,r):rDi}, where DiD_iDi is the set of distinct distances from xix_ixi to the rest of PnP_nPn. Since each Di<αkn1/2|D_i| < \alpha_k n^{1/2}Di<αkn1/2, the total number of circles satisfies C<kαkn1/2|\mathcal{C}| < k \alpha_k n^{1/2}C<kαkn1/2.

The key insight is that every point pPnSp \in P_n \setminus SpPnS lies on at least kkk circles in C\mathcal{C}C, one for each center in SSS. This yields a lower bound on the number of incidences between PnP_nPn and C\mathcal{C}C: (nk)k(n - k)k(nk)k. Applying the Pach–Sharir incidence bound (Theorem 1) with k=3k=3k=3 and s=2s=2s=2 for circles, the upper bound on incidences becomes c(3,2)(n3/5C4/5+n+C)c(3,2) \left( n^{3/5} |\mathcal{C}|^{4/5} + n + |\mathcal{C}| \right)c(3,2)(n3/5C4/5+n+C). Substituting the bound on C|\mathcal{C}|C and taking the limit as nn \to \inftyn yields kC(αk4/5k4/5+1)k \leq C(\alpha_k^{4/5} k^{4/5} + 1)kC(αk4/5k4/5+1), which implies αk=Ω(k1/4)\alpha_k = \Omega(k^{1/4})αk=Ω(k1/4).

This method hinges on the interplay between geometric construction and combinatorial incidence bounds, transforming a problem about distance distributions into one about curve-point incidences, and then leveraging asymptotic analysis to extract the desired growth rate.

Experiment

  • Aletheia, a math research agent built on Gemini Deep Think, evaluated 700 open Erdős problems, yielding 63 technically correct solutions, but only 13 meaningfully addressed Erdős’s intended problem statements; 50 correct solutions were mathematically trivial due to misinterpretation, and 12 were ambiguous.
  • Human evaluation revealed that 68.5% of 200 graded responses were fundamentally flawed, highlighting the high cost of verifying AI output, including debugging errors and checking for literature overlap or unintentional plagiarism.
  • Among 13 meaningfully correct solutions, 5 were autonomously novel (Erdős-652, 654, 935, 1040, 1051), though none rose to the level of a standalone research paper; Erdős-1051 was later expanded into a collaborative paper.
  • Aletheia also identified 8 problems already solved in the literature (e.g., Erdős-333, 591, 705), and rediscovered 3 solutions independently (e.g., Erdős-397, 659, 1089), raising concerns about subconscious plagiarism from training data.
  • Final validation revealed that some “solved” problems had flawed or ambiguous formulations, such as Erdős-75, where the listed problem was a misstatement of Erdős’s original intent, underscoring the need for precise problem framing in AI-driven research.
  • The effort emphasizes that while AI can assist in mathematical discovery, its outputs require rigorous human vetting, and claims of “accelerating science” must account for the hidden labor of verification and contextual accuracy.

The authors used a specialized AI agent to generate solutions for 700 open Erdős problems, then evaluated 200 candidate responses with human mathematicians. Results show that while 31.5% of responses were technically correct, only 6.5% meaningfully addressed the intended mathematical problem, highlighting the gap between syntactic correctness and semantic relevance in AI-generated mathematics. The evaluation underscores the need for rigorous human oversight and contextual understanding when assessing AI contributions to mathematical research.


AI로 AI 구축

아이디어에서 출시까지 — 무료 AI 코코딩, 즉시 사용 가능한 환경, 최적의 GPU 가격으로 AI 개발을 가속화하세요.

AI 협업 코딩
바로 사용 가능한 GPU
최적의 가격

HyperAI Newsletters

최신 정보 구독하기
한국 시간 매주 월요일 오전 9시 에 이번 주의 최신 업데이트를 메일로 발송합니다
이메일 서비스 제공: MailChimp