Evolutionary Ensemble of Agents

Published 9 May 2026 in cs.NE, cs.AI, and cs.LG | (2605.09018v1)

Abstract: We introduce Evolutionary Ensemble (EvE), a decentralized framework that organizes existing, highly capable coding agents into a live, co-evolving system for algorithmic discovery. Rather than reinventing the wheel within the "LLMs as optimizers" paradigm, EvE fixes the base agent substrate and focuses entirely on evolving the cumulative guidance and skills that dictate agent behaviors. By maintaining two co-evolving populations, namely functional code solvers and agent guidance states, the system evaluates agents through a synchronous race, updating their empirical Elo ratings based on the marginal gains they contribute to the current solver state. When applied to a research bottleneck in In-Context Operator Networks (ICON), EvE autonomously discovered a robust rescale-then-interpolate mechanism that enables reliable example-count generalization. Crucially, controlled ablations reveal the absolute necessity of stage-dependent agent adaptation to navigate the shifting search landscapes of complex codebases. Compared to variants driven by a fixed initial agent or even a frozen "best-evolved" agent, EvE uniquely avoids phase mismatch, demonstrating that organizing agents into a self-revising ensemble is the fundamental driver for breaking through static performance ceilings.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper demonstrates that evolving agent guidance—not agent code—effectively overcomes the positional encoding bottleneck in ICON.
It employs a dual-population evolutionary framework with solvers and agents, validated by outperforming static approaches on out-of-distribution benchmarks.
The robust, scalable design underlines stage-dependent agent adaptation as essential for continuous improvement in algorithmic discovery.

Evolutionary Ensemble of Agents: A Decentralized Framework for Adaptive Algorithmic Discovery

Overview

"Evolutionary Ensemble of Agents" (2605.09018) introduces Evolutionary Ensemble (EvE), a decentralized meta-optimization framework that orchestrates highly capable coding agents into an evolving system for robust algorithmic discovery. The distinguishing innovation is the decoupled, dual-population architecture: instead of optimizing model architectures, the system fixes the base agent substrate and shifts evolutionary search onto the guidance and skills that modulate agent behavior. The empirical investigation centers on solving the positional encoding (PE) bottleneck in In-Context Operator Networks (ICON), a scenario requiring out-of-distribution generalization in test-time sequence length—a highly nontrivial benchmark for code-evolution and automated machine learning systems.

Methodology

Dual-Population Evolutionary Framework

EvE operates with two co-evolving, scored populations:

Solvers (functional code artifacts within a shared base repository), each with cumulative evaluation logs.
Agents (coding agents augmented with cumulative working logs and guidance/skill trees).

Iterations proceed as a synchronous "race" among sampled agents, each given identical read-only context (reference solvers, agents, and base repository snapshot). Each agent generates a new solver variant and, potentially, a self-modified copy of itself (with updated guidance or skills). Subsequently, a pairwise Elo-style competition mechanism updates agent scores in accordance with the relative improvement of the solver variants, directly tying agent fitness to marginal utility in a controlled stochastic environment.

This competition-driven, credit assignment approach isolates the effectiveness of agent strategies with respect to a shifting search landscape. Agent “evolution” occurs by mutating and recombining guidance/state, not agent code, which stands in contrast to frameworks such as Darwin Gödel Machine and SICA. The benefits are twofold:

Orchestration and knowledge sharing become inherently scalable, as agents can freely read and adapt from peer-generated logs and artifacts.
Recursive nesting: ensembles can themselves be treated as atomic individuals within higher-order ensembles, supporting hierarchical composition.

Agent Workspaces and Evolution Dynamics

Each working agent operates within an isolated workspace, manipulating only permitted files (e.g., key model/configuration files) and strictly validated through smoke tests and scoring routines. Solver and agent populations are updated with the session artifacts after validation and evaluation.

A core hypothesis, validated empirically, is that static agents—whether from initialization or by snapshotting “best-so-far” guidance—incur phase mismatches as the optimization landscape transitions between early exploration and late-stage refinement. Only continual, stage-dependent agent adaptation enables robust traversal between performance plateaus, circumventing local minima and search pathology.

Empirical Evaluation: ICON Positional Encoding Discovery

Problem Definition

The central challenge is example-count generalization in ICON: models must generalize in-context reasoning from sequences with $k=5$ examples at train time to $k \gg 5$ at test time. The vanilla ICON design, relying on fixed learned embedding tables for position encoding, exhibits catastrophic OOD collapse when test sequence lengths exceed training bounds, due to failure to represent unseen positions.

Experimental Design

EvE is compared against “Static-Initial” (fixed seed agent) and “Static-Final” (frozen best agent from a previously evolved EvE run), dissecting the ablation gradient from no agent evolution to snapshot-then-freeze to continuous evolutionary adaptation. Performance is evaluated on a 1D conservation law benchmark, averaging error across $k=1$ to $k=10$ in-context examples, thereby quantifying robustness in both in-distribution ( $k \leq 5$ ) and OOD ( $k > 5$ ) regimes.

Results

EvE consistently delivers the lowest mean error across all $k$ (e.g., $e=0.114$ at 2k training steps; $e=0.041$ at 10k).
Static-Initial and Static-Final settings both deteriorate in OOD regime, plateauing at higher errors or exhibiting transfer degradation during retraining.
Crucially, ablation demonstrates that freezing agent evolution undermines the ability to adapt search strategies to changing solver-phase requirements, confirming agent adaptation as indispensable for overcoming performance ceilings.

All successful PE methods discovered by EvE exploit structural decompositions, e.g., factored slot/role indices and learned/parametric compression of overflow positions—directly addressing training-data limitations in vanilla architectures. Notably, EvE autonomously converges on robust rescale-then-interpolate PE mechanisms, with late-phase agent guidance dynamically retracting non-performing strategies and extending promising ones, as visible in session logs and code artifacts.

Theoretical and Practical Implications

EvE fundamentally reframes code-evolution by structuring the evolutionary substrate around agent guidance/states rather than agent code or prompt templates. This suggests several implications:

Meta-adaptive agent design: Instead of statically optimizing prompts or architectures, EvE provides a runtime infrastructure where search dynamics adapt to solver-phase shifts, making it more robust to problem nonstationarity—a foundational requirement for open-ended discovery and scientific automation.
Universal compatibility and recursive nesting: The decentralized, role-free design allows encapsulation of arbitrary agents or ensembles, facilitating hierarchical systems or multi-population coevolution strategies.
Credit assignment precision: Synchronous benchmarking with controlled context enables precise attribution of marginal innovations, driving ensemble diversity without sacrificing convergence.
Future directions: The paper identifies optimization of inter-agent connection topology as the next frontier, analogous to emergent order in physical systems. The long-term objective is to balance diversity (avoiding collapse to uniformity) and coherence (avoiding unstructured stochasticity), potentially yielding large-scale, self-organizing scientific reasoning systems.

Conclusion

"Evolutionary Ensemble of Agents" substantiates that decentralized, continuously adapting ensembles of coding agents, evolving via their cumulative guidance and skill state—not agent code itself—can autonomously break through the bottlenecks facing static or single-agent search paradigms in complex algorithmic domains. Empirical results on ICON positional encoding generalization strongly support stage-dependent agent adaptation as a necessary property for robust algorithmic discovery. The architectural principles and empirical methodologies detailed in this work provide a reference for the design of future scalable, adaptive, and recursive agentic systems in scientific and AI research.