- The paper establishes a new framework that empowers agents to autonomously orchestrate open-ended discovery using persistent shared memory and asynchronous execution.
- The paper demonstrates strong empirical performance with 3–10× improvement rates and convergence in 5–20 evaluations compared to fixed evolutionary baselines.
- The paper highlights how local verification and cross-agent information transfer lead to robust knowledge accumulation and enhanced solution quality.
CORAL: Autonomous Multi-Agent Evolution for Open-Ended Discovery
Motivation and Context
The CORAL framework addresses the challenge of open-ended discovery tasks, characterized by unknown optimal solutions and necessitating iterative search and sustained knowledge accumulation. Previous LLM-driven evolutionary search methods, such as FunSearch, AlphaEvolve, and EvoX, have demonstrated the efficacy of coupling LLMs as program mutators within evolutionary search loops, yet critical search decisions remain externally fixed—parent selection, mutation, and knowledge reuse are dictated by rigid heuristics outside the agent's control. Multi-agent systems have similarly relied on human-engineered decompositions and fixed communication structures, constraining exploration diversity. CORAL advances beyond these restrictions by delegating search orchestration directly to autonomous agents, facilitated by shared persistent memory and asynchronous execution.
Figure 1: Comparison of three paradigms for LLM-based open-ended discovery.
Framework Architecture and Core Mechanisms
CORAL operationalizes autonomous multi-agent evolution via three principal mechanisms: persistent shared memory, asynchronous agent execution, and heartbeat-driven interventions. The shared memory, implemented as an extensible file system, includes structured directories—attempts, notes, and skills—enabling both retrospective inspection and proactive knowledge contribution. Agents operate in isolated workspaces, interfacing with the memory via symbolic links and a domain-specific CLI, thereby maintaining strong safety properties, evaluator isolation, and workspace integrity.
Asynchronous agent organization eschews explicit peer-to-peer messaging: coordination is emergent, arising through concurrent agent interaction with the persistent memory. Heartbeat mechanisms, triggered by interval or plateau conditions, induce systematic self-reflection, memory consolidation, and strategy redirection at both local and global scope, mitigating stagnation and facilitating trajectory diversification.
Figure 2: Overview of the CORAL framework; agents propose, evaluate, and consolidate candidate solutions with periodic reflection via heartbeat triggers.
The framework's modular architecture partitions functionality into configuration parsing, agent lifecycle management, hierarchical grading, workspace setup, and hub-based memory organization. All evaluation artifacts are recorded as structured JSON or markdown, allowing post-hoc trajectory analysis at any granularity.
Figure 3: Architecture of CORAL, highlighting modular design and primary data flows between agent system, grader, workspace, hub, and core types.
Experimental Evaluation and Results
Evaluation spans two benchmark suites (mathematical and systems optimization) and two advanced stress-test tasks (Anthropic's Kernel Engineering and Polyominoes Packing). CORAL demonstrates strong empirical performance, establishing new state-of-the-art (SOTA) on 10 tasks. Notably, improvement rates (fraction of evaluations yielding score improvement) are 3–10× higher than evolutionary search baselines, and convergence typically occurs within 5–20 evaluations versus 60–100 for fixed-search methods. On Kernel Engineering, four agents drive the best-known score from 1363 to 1103 cycles (20% improvement), and CORAL sets a new SOTA (89.4%) on Polyominoes Packing, outstripping previous records.
Figure 4: Polyominoes packing—comparison between single-attempt baseline and CORAL; CORAL achieves 89.4% coverage, surpassing prior SOTA.
Multi-agent co-evolution extends the search frontier beyond strong single-agent autonomy: diversity of exploration and emergent knowledge exchange yield higher final scores on advanced tasks even when controlling for compute budget. These gains persist across both proprietary (Claude Opus 4.6) and open-source (MiniMax M2.5 + OpenCode) models, affirming abstraction-level generalization.
Mechanistic Analyses and Ablations
Trajectory analyses reveal that CORAL’s improvement stems from two key behaviors—local verification and persistent knowledge accumulation. The frequency of local test execution positively correlates with improvement probability, especially in tasks amenable to cheap local validation. Knowledge artifact creation (notes, skills) is causally linked to score increases, with advanced tasks benefiting from higher artifact frequency and quality, documentating failed approaches and reusable insights.
Multi-agent runs exhibit substantial cross-agent information transfer: attempts inheriting from another agent’s commit have nearly double the improvement rate versus independent attempts, and records are often collaboratively set. Exploration diversity is quantified via strategy vocabulary overlap; Jaccard similarity scores indicate agents collectively probe a broader solution space than any singleton.
Ablation studies confirm the causal impact of both knowledge accumulation and co-evolution: disabling notes/skills substantially degrades performance, and gains from multi-agent coordination are not reducible to parallel independent trials.
System Practicalities and Safeguards
CORAL incorporates robust infrastructure safeguards: evaluator logic is isolated from agents to prevent reward hacking; workspace organization avoids file-level concurrency conflicts via unique naming and symlinks; session persistence and dead agent recovery ensure long-horizon robustness. Heartbeat prompts, delivered via SIGINT-based session interruption, modulate agent behavior without discarding context, and agents may customize their heartbeat trigger configuration at runtime.
The user interface provides real-time visibility into search trajectories, agent-level activity logs, shared memory browsing, and score evolution, supporting longitudinal experiment tracking and interactive knowledge inspection.

Figure 5: CORAL user interface; optimization trajectory, agent-level summaries, and recent activity traced in real time.
Implications and Future Directions
CORAL’s results substantiate the thesis that greater autonomy and horizontal parallelism among LLM agents can expedite open-ended discovery, catalyzing a shift away from externally-specified evolutionary protocols. Practical implications are immediate—agent-driven search via persistent memory and structured reflection accelerates discovery in mathematical, algorithmic, and system optimization domains, suggesting future utility for scientific research and engineering.
Theoretically, the paradigm raises new questions about self-organizing agent populations, knowledge diffusion trajectories, and adaptive role specialization. Remaining limitations include reliance on frontier models with high API cost, homogeneous agent initialization, and need for richer evaluators. Future work may target smaller, customized models, bootstrapped agent heterogeneity, and co-evolving evaluators.
Conclusion
CORAL establishes a rigorous framework for autonomous multi-agent evolution, substantially outperforming fixed evolutionary search baselines in open-ended discovery tasks. Knowledge accumulation and emergence of collaboration via shared memory are essential to search efficiency and solution quality. The implications extend to both practical deployments and theoretical investigations in distributed evolutionary optimization; CORAL serves as a baseline infrastructure and systematic study for future autonomous discovery systems (2604.01658).