Co-Evolving Agents Framework

Updated 4 December 2025

Co-Evolving Agents Framework is a methodology where decentralized agents jointly update models and policies through shared parameter exchange, balancing global cooperation with local specialization.
It employs dual-adapter architectures and dynamic communication protocols to reduce training overhead while ensuring scalability across heterogeneous environments.
Empirical benchmarks reveal improved accuracy and efficient convergence, as demonstrated by substantial gains in multi-task and cross-domain applications.

A co-evolving agents framework encompasses a broad class of methodologies in which multiple adaptive agents or subsystems update their internal state, policy, model parameters, or interaction structure jointly, with mutual influence through explicit protocols or implicit environmental coupling. This paradigm is foundational in contemporary multi-agent systems (MAS) research, where decentralized or distributed learning, personalized adaptation, communication efficiency, and scalability are essential requisites. The co-evolutionary process can occur at various levels: parameter sharing vs. local adaptation, model–environment duals, experience or curriculum exchange, and reward shaping via intrinsic or interaction-driven signals. The frameworks reviewed below include recent advances in parameter-efficient MAS, co-evolving LLM world models, experience-driven LLM agent pairs, proactive self-evolving assistants, population-level evolutionary MAS, failure-driven co-training, and more.

1. Core Algorithmic Concepts and Architectures

Co-evolving agent frameworks are typically formalized as decentralized (or weakly federated) multi-agent optimization procedures, wherein each agent maintains private model parameters and updates, but a subset of parameters or outputs are periodically exchanged with other agents. A prototypical setup is as exemplified in the PE-MA framework (Deng et al., 13 Jun 2025):

Let $\mathcal{A} = \{a_1, ..., a_N\}$ be a network of $N$ agents, each with its own data distribution $D_i \sim P_i$ , possibly heterogeneous.
Each agent maintains a (frozen) backbone network $\theta_{\rm freeze}$ (e.g., pre-trained Transformer or ResNet).
Two adapters per agent: a shared adapter $W_i$ for global knowledge sharing (periodically averaged with neighbors), and a personalized adapter $V_i$ for agent-level adaptation, updated only locally.
The co-evolution step alternates between $K$ rounds of local training on both adapters (Equation: $L({W_i}, {V_i}) = \frac{1}{N} \sum_{i=1}^{N} L_i(W_i, V_i)$ ) and a communication step aggregating shared adapters via a doubly-stochastic protocol over the graph $G=(\mathcal{A}, E)$ :

$W_i^{(t+1)} = \sum_{j \in \mathcal{N}(i) \cup \{i\}} P_{ij} W_j^{(t+1/2)}$

The personalized $V_i$ is never exchanged.

This class of dual-adapter architectures achieves tight coupling between global pattern sharing and local specialization, supporting fine control via a mixing weight $p \in [0,1]$ .

Extending these principles, co-evolving frameworks also encompass:

Agent–environment loops where a world model and a policy co-train each other in alternation, e.g., WebEvolver’s agent LLM and world-model LLM for POMDP rollout and synthetic data augmentation (Fang et al., 23 Apr 2025).
Dual-role LLM agents exchanging trajectories, experiences, or “shortcuts,” with in-context retrieval guiding future decisions, as in Experiential Co-Learning for software development (Qian et al., 2023).
Decentralized discussion and intrinsic reward loops (solution, evaluation, scoring) as in CoMAS (Xue et al., 9 Oct 2025).
Task-constructing and task-solving agent pairs (e.g., curriculum–executor), driving each other's capabilities upward (Agent0 (Xia et al., 20 Nov 2025)).
Parameter evolution via workflow graphs and policy optimization in complex LLM-driven agentic workflows (EvoAgentX (Wang et al., 4 Jul 2025)).

2. Co-Evolutionary Optimization and Theoretical Analysis

Co-evolving agent systems typically require rigorous analysis of optimization dynamics, convergence, and stability in heterogeneous decentralized settings:

PE-MA (Deng et al., 13 Jun 2025) provides an explicit convergence theorem for its decentralized, dual-adapter model. Under smoothness (L-Lipschitz), unbiased gradient, bounded variance, and spectral-gap conditions on the communication matrix, the consensus error and per-agent gradient norms satisfy

$\frac{1}{K} \sum_{t=0}^{K-1} \mathbb{E}[M(t)] = O\left(\frac{1}{\sqrt{NK}}\right)$

where $N$ is agent count and $K$ is local update steps.

Such rates match known minimax bounds for decentralized stochastic optimization and demonstrate that parameter-efficient, partially shared architectures need not sacrifice convergence speed for communication reduction.
Interaction-based RL systems such as CoMAS (Xue et al., 9 Oct 2025) use decentralized, heterogeneous policy-gradient updates with exclusively intrinsic, interaction-formulated rewards. Each agent $i$ maintains local policy $\pi_i(a|s;\theta_i)$ , collects private replay buffers, and maximizes a clipped, KL-regularized REINFORCE++ objective. The framework demonstrates empirical monotonic gains as agent diversity and count increase, and ablations verify the necessity of coordinated, interaction-derived signals.
In frameworks connecting language evolution to agent architecture, such as the LTE for referential games (Dagan et al., 2020), evolutionary pressure is exerted both via population-level culling based on learning speed (“fitness”) and through random-mutation of RNN cell architectures; resulting language protocols thus co-adapt to both data and architecture biases.

3. Empirical Benchmarks and Performance Analysis

State-of-the-art co-evolving agent frameworks are typically validated on cross-domain, multi-task, and real-world benchmarks, with systematic ablations:

PE-MA (Deng et al., 13 Jun 2025):
- Tested on Office-Home, Office-Caltech10, DomainNet (image classification, heterogeneous agents).
- Achieves up to 2–5% absolute accuracy gain over DSGD/FedSim baselines, with 70–87% reduction in communication and training parameter counts.
- Robust to sparse connectivity (ring/ER graphs), supports adaptive per-agent mixing weight, and tolerates substantial agent dropout with minor accuracy degradation.
WebEvolver (Fang et al., 23 Apr 2025):
- Benchmarked on Mind2Web-Live, WebVoyager, GAIA-web.
- Co-evolving LLM world models yield up to 10% absolute success rate improvement over self-improving agent-only baselines, with world model hallucinations driving exploratory breadth.
Experiential Co-Learning (Qian et al., 2023):
- On software engineering tasks (NLDD), co-evolving instructor-assistant pairs leveraging mined experience “shortcuts” achieve higher autonomy, executability, and few-shot transfer acceleration (80% fewer turns, 3× first-pass compilability).
CoMAS (Xue et al., 9 Oct 2025):
- Across mathematics (GSM8K, MATH-500), coding (HumanEval, MBPP), and general knowledge (MMLU) domains, achieves up to 70.4% consensus accuracy (Debate setting), outperforming MAPoRL/TTRL.
- Empirically, reward-driven, multi-agent co-evolution enhances exploration, discouraging reward hacking and providing stable improvement as agent count and heterogeneity increase.
EvoAgentX (Wang et al., 4 Jul 2025):
- Gains attributed to iterative, multi-modal optimization of agent prompts, tool configs, and workflow graphs.

4. Granular Personalization, Communication Efficiency, and Decentralization

Personalization and communication overhead are critical design axes:

Parameter-Efficient Personalization: PE-MA’s decoupled dual-adapter architecture (shared $W_i$ vs. private $V_i$ ) directly enables agent-level specialization under strongly heterogeneous or data-scarce regimes. Empirically, adaptive $p$ selection (per-agent mixing) leads to 30–50% faster convergence for low-data agents (Deng et al., 13 Jun 2025).
Epochal Communication: By restricting parameter exchange to small adapters and aggregating only over the communication graph, frameworks drastically reduce bandwidth cost versus classical DSGD schemes—often by an order of magnitude.
Local Adaptation vs. Global Coordination: Several systems (e.g., Lim et al. (Lim et al., 2022)) show—both theoretically and empirically—that “blind mimicry” or overly global synchronization is suboptimal. Enabling structured, small-group cooperation with local memory or experience banks balances exploration and exploitation, especially under rugged or malleable “fitness landscapes.”
True Decentralization: Intrinsic-reward learning, as in CoMAS, reinforces decentralized policy updates without need for external rewarders, ensuring scalability and robustness to environmental and agent-level perturbations (Xue et al., 9 Oct 2025).

5. Extensions: Co-Evolutionary Curriculum, Memory, and Emergent Capabilities

Modern co-evolving agent systems increasingly embed long-term memory, curriculum evolution, and meta-cognitive reflection:

Curriculum Co-Evolution: In Agent0, two agents (curriculum generator $\pi_\theta$ , executor $\pi_\phi$ ) enter a symbiotic escalation: the curriculum agent crafts tasks at the current capability frontier (maximizing tool-use and executor uncertainty), while the executor receives dynamic curricula filtered for ambiguity (Xia et al., 20 Nov 2025). This mechanism yields substantial boosts in both mathematical and general reasoning (+18% and +24% on Qwen3-8B-Base), with iterative, ambiguity-aware policy optimization (ADPO) further enhancing stability.
Meta-Level Self-Evolution and Cognition: The Galaxy framework introduces “Cognition Forests,” data structures unifying semantic, functional, and implementation metadata, which are modified by a meta-agent (Kernel) in response to failures, user-driven design, and privacy concerns (Bao et al., 6 Aug 2025). This supports both proactive capability generation and privacy-aware behavior.
Experience and Memory Pools: Agent pairs maintaining explicit memory banks of procedural “shortcuts” (as in Experiential Co-Learning) or explicit per-group “elite banks” (as in Lim et al.) enhance both sample efficiency and transfer/generalization, with an emergent reduction in graph complexity and policy entropy (Qian et al., 2023, Lim et al., 2022).
Language–Bias Co-Evolution: LTE (Dagan et al., 2020) exposes that not only does the communicative protocol adapt, but the learning biases (e.g., neural architecture/genotype) of agents co-adapt, leading to more compositional, consistent, and learnable emergent languages.

6. Broader Impact, Limitations, and Future Directions

Co-evolving agent frameworks underpin significant progress in robust, autonomous, and scalable MAS. Their benefits include:

Parameter- and communication-efficient large-scale collaboration (Deng et al., 13 Jun 2025).
Breaking self-improvement plateaus by coupling policy with environment/world-model evolution (Fang et al., 23 Apr 2025).
Generalization via hard-negative mining through co-evolving specialized failure agents (Jung et al., 27 Nov 2025).
Emergent coalition formation, norm evolution, and dynamic objective arbitration in open, heterogeneous ecosystems (Li et al., 5 Feb 2025).

Limitations and open challenges include:

Stability under highly non-stationary, adversarial, or open-ended environments (addressed partially in population-level and diversity-rich frameworks such as CoMAS (Xue et al., 9 Oct 2025), but still not fully characterized theoretically).
The risk of premature convergence or stagnation if communication/aggregation strategies (e.g., mixing weight, memory sharing) are not adaptively tuned (see Lim et al. and PE-MA ablations).
The complexity of extending tightly coupled co-evolutionary protocols to real-time, high-dimensional decision domains (robotics, continuous control, multi-modal environments).

Active research directions target: hierarchical/memory-augmented co-evolutionary workflows, richer curriculum and experience design, explicit safety and alignment enforcement via dynamic protocols, and extension to multi-agent, multi-environment federated settings with human-in-the-loop oversight.

Key References

PE-MA: Parameter-Efficient Multi-Agent Co-Evolution (Deng et al., 13 Jun 2025)
WebEvolver: Coevolving World Model–Agent Systems (Fang et al., 23 Apr 2025)
Experiential Co-Learning of Software-Developing Agents (Qian et al., 2023)
Galaxy: Cognition-Centered Co-Evolution of LLM Agents (Bao et al., 6 Aug 2025)
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards (Xue et al., 9 Oct 2025)
EvoAgentX: Automated Evolution of Agentic Workflows (Wang et al., 4 Jul 2025)
Agent0: Self-Evolving Agents via Co-Evolutionary Curriculum (Xia et al., 20 Nov 2025)
Lim et al.: Cooperative Multi-Agent Search on Endogenously-Changing Fitness Landscapes (Lim et al., 2022)
Language Transmission Engine: Co-Evolution of Language and Agents in Referential Games (Dagan et al., 2020)

Markdown Upgrade to Chat

References (11)

PE-MA: Parameter-Efficient Co-Evolution of Multi-Agent Systems (2025)

WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model (2025)

Experiential Co-Learning of Software-Developing Agents (2023)

CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards (2025)

Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning (2025)

EvoAgentX: An Automated Framework for Evolving Agentic Workflows (2025)

Co-evolution of language and agents in referential games (2020)

Cooperative Multi-Agent Search on Endogenously-Changing Fitness Landscapes (2022)

Galaxy: A Cognition-Centered Framework for Proactive, Privacy-Preserving, and Self-Evolving LLM Agents (2025)

10.

Co-Evolving Agents: Learning from Failures as Hard Negatives (2025)

11.

Position: Emergent Machina Sapiens Urge Rethinking Multi-Agent Paradigms (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Co-Evolving Agents Framework.