HyperAIHyperAI

Command Palette

Search for a command to run...

CORAL: 오픈 엔디드 디스커버리를 위한 자율적 멀티 에이전트 진화를 향해

초록

대규모 언어 모델(LLM) 기반 진화는 지속적 탐색과 지식 축적이 필요한 개방형 발견(open-ended discovery)을 위한 유망한 접근법입니다. 그러나 기존 방법들은 여전히 고정된 휴리스틱과 하드코딩된 탐색 규칙에 크게 의존하여 LLM 에이전트의 자율성을 제한하고 있습니다. 본 논문에서는 개방형 문제에 대한 최초의 자율적 다중 에이전트 진화 프레임워크인 CORAL 을 제시합니다. CORAL 은 경직된 제어를 대체하여 공유된 지속적 메모리, 비동기 다중 에이전트 실행, 하트비트 기반 개입을 통해 탐색, 반성, 협업을 수행하는 장기간运行的 에이전트를 도입합니다. 또한 격리된 작업 공간, 평가자 분리, 리소스 관리, 에이전트 세션 및 상태 관리 등 실용적인 안전 장치를 제공합니다. 다양한 수학적, 알고리즘적, 시스템 최적화 작업에 대한 평가 결과, CORAL 은 10 개 작업에서 새로운 최첨단(state-of-the-art) 성능을 달성하였으며, 고정 진화 탐색 베이스라인 대비 3~10 배 높은 개선율을 훨씬 적은 평가 횟수로 달성했습니다. Anthropic 의 커널 엔지니어링 작업에서는 공진화하는 네 개의 에이전트가 기존 최고 점수인 1363 사이클을 1103 사이클로 개선했습니다. 메커니즘 분석은 이러한 성능 향상이 지식 재사용과 다중 에이전트 간 탐색 및 의사소통에서 비롯됨을 보여줍니다. 이러한 결과들은 에이전트 자율성과 다중 에이전트 진화가 개방형 발견을 크게 향상시킬 수 있음을 시사합니다. 관련 코드는 https://github.com/Human-Agent-Society/CORAL 에서 이용 가능합니다.

One-sentence Summary

Researchers from MIT, NUS, and other institutions introduce CORAL, an autonomous multi-agent framework that replaces rigid heuristics with persistent memory and asynchronous collaboration. This approach achieves state-of-the-art results on diverse optimization tasks, including a 20% improvement on GPU kernel engineering, by enabling sustained knowledge accumulation and agent autonomy.

Key Contributions

  • The paper introduces CORAL, a framework for autonomous multi-agent evolution that replaces rigid control with long-running agents utilizing shared persistent memory, asynchronous execution, and heartbeat-based interventions to explore and collaborate on open-ended problems.
  • This work establishes a new paradigm by formulating autonomous evolution as a distinct approach that delegates search decisions to agents, enabling them to iteratively refine solutions through knowledge retrieval, contribution, and distillation without fixed heuristics.
  • Experiments demonstrate that CORAL sets new state-of-the-art results on 10 diverse tasks with 3–10× higher improvement rates than fixed baselines, while mechanistic analyses confirm that these gains stem from effective knowledge reuse and multi-agent communication.

Introduction

Open-ended discovery in fields like mathematical optimization and systems engineering requires sustained iterative search rather than one-shot generation, yet current LLM-based approaches rely on fixed evolutionary heuristics that limit agent autonomy. These rigid pipelines force agents to follow hard-coded rules for parent selection and exploration, preventing them from adapting their search strategy or effectively reusing knowledge across long horizons. The authors introduce CORAL, a framework that replaces these static controls with autonomous multi-agent evolution where long-running agents collaborate through shared persistent memory and asynchronous execution. By allowing agents to decide what to explore, reflect on progress via heartbeat mechanisms, and accumulate reusable skills, CORAL achieves state-of-the-art results with significantly fewer evaluations than traditional baselines.

Dataset

  • Dataset Composition and Sources The authors utilize a curated collection of 13 evaluation tasks spanning mathematical optimization, systems optimization, and stress-test problems. These tasks are sourced from established benchmarks like ADRS and include specific challenges such as the Erdős Minimum Overlap Problem, Transaction Scheduling, and Kernel Engineering. The data is structured via YAML configuration files that define task metadata, grading logic, and agent parameters.

  • Key Details for Each Subset

    • Mathematical Optimization: Includes 6 tasks such as circle packing and signal processing, where agents must solve complex inequalities or minimize overlap integrals.
    • Systems Optimization: Comprises 5 tasks like EPLB and LLM-SQL, focusing on algorithmic improvements without web search capabilities.
    • Stress-Test Problems: Features high-complexity scenarios like the VLIW SIMD Kernel Builder, which requires optimizing code to reduce cycle counts from a baseline of ~147,734 to a best-known ~1,363.
    • Grading Logic: Each task employs a Python-based grader that executes agent solutions in a subprocess, validates constraints, and returns a numerical score or error status.
  • Usage in Model Training and Evaluation The authors use this dataset exclusively for evaluation rather than training, running agents asynchronously to generate performance data. The system supports both single-agent and multi-agent configurations, with the latter allowing up to 4 agents to collaborate on a single task. Evaluation metrics are derived from the ratio of the agent's score against the baseline and best-known solutions, with specific timeouts (e.g., 600s or 1100s) enforced per task.

  • Processing and Metadata Construction

    • Shared Persistent Memory: The authors implement a centralized filesystem within the .coral/public/ directory to store artifacts. This includes attempt records (JSON files keyed by commit hash), hierarchical notes (Markdown with YAML frontmatter), and reusable skills.
    • Artifact Management: Agents interact with shared memory via symbolic links to their isolated worktrees, preventing accidental commits while enabling access to native file tools.
    • Concurrency Handling: The system avoids explicit locking by assigning unique filenames for notes and skills and using commit hashes for attempt records, ensuring no file-level conflicts occur during asynchronous execution.
    • Metadata Enrichment: Each attempt record captures the agent ID, score, status, parent hash, and detailed feedback, while notes are organized by topic to consolidate knowledge and track open questions.

Method

The authors propose CORAL, a framework designed to facilitate autonomous multi-agent evolution for open-ended discovery tasks. In this paradigm, the goal is to iteratively discover increasingly strong candidate solutions under evaluator feedback without a known optimal target. The process is abstracted into four stages: Retrieve, Propose, Evaluate, and Update.

Refer to the comparison of search paradigms below:

Traditional methods often rely on fixed evolutionary search where external rules govern the Retrieve and Update stages, limiting the agent's role primarily to Propose. In contrast, CORAL implements autonomous single-agent evolution where the agent controls the timing and realization of all four stages. This is further extended to autonomous multi-agent evolution, where multiple agents run asynchronously and coordinate through shared persistent memory rather than direct communication. This design increases exploration diversity and allows agents to inspire one another indirectly.

The overall workflow of the CORAL framework is illustrated in the diagram below:

The system operates through a central Manager Infra that handles agent lifecycle and heartbeat coordination. Each agent runs in an isolated workspace and executes an autonomous loop consisting of reading attempts, notes, and skills from the shared memory, planning and editing code, running the evaluation grader, and writing new notes and skills back to the shared store. The Shared Persistent Memory is structured as a file system with three root folders: attempts for historical evaluations, notes for observations and reflections, and skills for reusable procedures. To prevent agents from stagnating in local minima, a Heartbeat Monitor triggers periodic interventions such as Reflection (recording notes), Consolidation (organizing notes into skills), and Redirection (pivoting strategy when no improvement is observed).

The underlying software architecture is modular and organized into six key components as shown below:

The Configuration module parses YAML task definitions to initialize the system. The Agent System manages the lifecycle of agents through an AgentManager and HeartbeatRunner, ensuring persistence and handling interruptions. The Grader Hierarchy provides a pluggable evaluation interface where a BaseGrader defines the protocol for scoring candidates, implemented by specific graders like TaskGrader or FunctionGrader. Workspace Setup creates isolated per-agent worktrees with symbolic links to the Hub, which stores the shared persistent memory. Finally, Core Types define the data models used throughout the system, such as Task, Score, and ScoreBundle, ensuring consistent data flow between components.

Experiment

  • CORAL is evaluated on mathematical optimization, systems optimization, and challenging stress-test problems, demonstrating that autonomous multi-agent evolution significantly outperforms fixed evolutionary search baselines by achieving new state-of-the-art results on the majority of tasks.
  • The autonomous design allows agents to dynamically decide exploration strategies and pivot approaches based on feedback, resulting in much higher improvement rates and faster convergence compared to methods relying on predefined heuristics.
  • Multi-agent co-evolution extends the search frontier beyond single-agent capabilities, particularly on complex tasks where individual runs plateau early, by enabling diverse exploration trajectories and the organic diffusion of techniques through shared persistent memory.
  • Qualitative analysis reveals that local verification of code before external evaluation and the accumulation of reusable knowledge artifacts are critical drivers of performance, especially for advanced tasks requiring deep architectural insights.
  • Ablation studies confirm that the performance gains stem from the co-evolution mechanism and knowledge accumulation rather than simply increased compute resources, with benefits generalizing effectively to open-source model stacks.

AI로 AI 구축

아이디어에서 출시까지 — 무료 AI 코코딩, 즉시 사용 가능한 환경, 최적의 GPU 가격으로 AI 개발을 가속화하세요.

AI 협업 코딩
바로 사용 가능한 GPU
최적의 가격

HyperAI Newsletters

최신 정보 구독하기
한국 시간 매주 월요일 오전 9시 에 이번 주의 최신 업데이트를 메일로 발송합니다
이메일 서비스 제공: MailChimp