Experience Inheritance in Multi-Agent Systems

Updated 16 November 2025

Experience inheritance is the explicit transfer and reuse of experiential knowledge, including decision traces, skills, and workflow artifacts, among autonomous agents.
Mechanisms involve diverse representations such as trajectory tuples, replay buffers, and graph memories, with selection protocols based on similarity and reward scoring.
Empirical validations show enhancements in sample efficiency, convergence speed, and overall performance across various multi-agent and cross-task frameworks.

Experience inheritance across agents refers to the explicit transfer, sharing, and reuse of experiential knowledge—such as decision traces, skills, learned policies, or workflow artifacts—between autonomous agents operating in multi-agent or cross-task environments. Unlike classic independent learning paradigms, experience inheritance mechanisms systematically accumulate, distill, rank, and utilize past agent experiences for accelerating future learning, broadening task generalization, and enabling collective intelligence. The precise forms of inheritance vary by algorithmic paradigm, encompassing learned policy fragments, prioritized replay buffers, compositional skills, high-reward decision trajectories, and graph-encoded meta-cognitive strategies.

1. Mechanisms and Representations of Inherited Experience

Experience can be represented as trajectory tuples, skill abstractions, agent-specific pools, shared libraries, workflow artifacts, or graph-structured memories.

Replay Buffers/Experience Pools: Agents locally or cooperatively accumulate $(s_t, a_t, r_t)$ tuples, with pools indexed by semantic similarity and historical reward (Li et al., 29 May 2025, Souza et al., 2019, Gerstgrasser et al., 2023).
Graph Memory: Trajectories and strategic decisions are distilled into multi-layered graphs, including raw queries, finite-state-machine (FSM) transition paths, and interpretable meta-cognitions, with edge weights adaptively tuned via RL gradients (Xia et al., 11 Nov 2025).
Skill Libraries: In bottom-up paradigms, experiences encode sequences of human-like actions augmented with NL descriptors and implicit reward signals, collectively forming a dynamic skill library (Du et al., 23 May 2025).
Dual-Faceted Banks: Experience banks may separate comprehension-level artifacts from low-level modification strategies, as in software repair, enabling agents to retrieve targeted guidance at multiple reasoning granularities (Chen et al., 31 Jul 2025).
Distributed Context Pools: Hierarchical MAS frameworks, e.g., 360°REA, utilize both local and global experience pools to distribute individualized lessons and systemic best practices across teams of agents (Gao et al., 8 Apr 2024).
Genetic Fragments ("learngenes"): Evolution-inspired RL frameworks represent inheritable knowledge as fragments of neural-network weights, which encode task-agnostic "instincts" and are transmitted across generations (Feng et al., 2023).

These heterogeneous representations allow decoupled, plug-and-play transfer of experiences across agents of differing architectures, modalities, or frameworks (Tang et al., 8 Jul 2025).

Selection protocols determine which experiences are shared, how they are retrieved, and how they are injected into the agent’s current reasoning context.

Prioritized Experience Replay: Selection based on high TD-error or novelty; only the most surprising/valuable experiences are relayed between agents to focus learning signals (Souza et al., 2019, Gerstgrasser et al., 2023).
Semantic Similarity and Reward Scoring: Retrieval scores combine cosine distance in embedding space and historical step-wise rewards, such that the top-K highest-scoring experiences (from local and neighbor pools) are concatenated as few-shot exemplars (Li et al., 29 May 2025).
Hybrid Lexical/Semantic Retrieval: Dual-index systems leverage BM25 for fast domain filtering and transformer embeddings for conceptual similarity, with a hybrid score used to shortlist cross-domain workflows (Tang et al., 8 Jul 2025).
Pessimism-based Distribution Matching: CoPS leverages provable selection criteria, penalizing experiences that diverge significantly from the decoder-distribution of the current task, maximizing expected utility while guarding against out-of-distribution harm (Yang et al., 22 Oct 2024).
Finite Automata and Workflow Reuse: ICE extracts deterministic workflow artifacts and pipelines, which are retrieved via nearest-neighbor embedding similarity and directly reused as planning or execution primitives (Qian et al., 25 Jan 2024).
Skill Library Broadcasting: New skills validated by recognizability and positive reward are atomically added to centralized repositories and broadcast across agent population through publish-subscribe mechanics (Du et al., 23 May 2025).

Empirical analyses consistently show that selective, task-relevant experience relay significantly outperforms naïve or indiscriminate sharing in both sample efficiency and final returns (Souza et al., 2019, Gerstgrasser et al., 2023).

3. Inheritance Modalities: Local, Cross-Agent, Cross-Task, and Population-Level

Inheritance can be tightly localized, restricted to agent neighborhoods, or realized system-wide across frameworks and generational boundaries.

Local and Neighborhood Pooling: MAEL permits retrieval from both local low-hop agent pools and neighbors on the collaboration graph; $h$ -hop union of pools suffices for efficient inheritance, with $h=1\text{–}2$ typically optimal (Li et al., 29 May 2025).
Centralized Evolutionary Pools: Genetic RL frameworks maintain population-level gene pools, with fragments updated via lifetime learning and intergenerational tournament selection, supporting Lamarckian or Darwinian inheritance (Feng et al., 2023).
Collective Skill Evolution: Bottom-up paradigms and skill libraries allow dynamic, population-wide enrichment, with newly validated skills immediately available to all agents (Du et al., 23 May 2025).
API-mediated Cross-Framework Memory: AGENT KB abstracts trajectories across agentic frameworks and supplies experiences through inference-time API calls, enabling true plug-and-play inheritance across heterogeneous systems (Tang et al., 8 Jul 2025).
Distributed Contextual Grouping: Supervisor-directed concurrent sharing exploits online assignment of contextual similarity for efficient experience routing among statistically compatible agents (Garant et al., 2017).

In all cases, the underlying principle is to maximize the reuse of high-utility experiential traces while minimizing the risk from incompatible or out-of-distribution knowledge.

4. Empirical Validation: Impact on Efficiency, Quality, and Generalization

Experience inheritance mechanisms consistently induce measurable improvements on established benchmarks in efficiency, solution quality, and generalization.

Sample Efficiency and Convergence: MAEL yields up to 49% reduction in token consumption and 26% fewer dialogue rounds to convergence in collaborative reasoning tasks. On software repair, SWE-Exp reduces unnecessary MCTS expansion by 15–20% (Li et al., 29 May 2025, Chen et al., 31 Jul 2025).
Absolute Performance Lifts: HumanEval and CommonGen tasks show monotonic gains as experience pools scale—accuracy climbs from 76.7%→90.0% and from 34.5%→46.5%, respectively (Li et al., 29 May 2025).
Architectural Independence: FLEX demonstrates gradient-free inheritance, with frozen agents inheriting libraries and achieving +13 to +17 point absolute gains in math and chemistry benchmarks even without model finetuning (Cai et al., 9 Nov 2025).
Collective Gains Across Frameworks: AGENT KB delivers 18.7pp pass@3 improvement on GAIA and up to 14pp improvement on SWE-bench, robustly across frameworks and backbone models. Hybrid retrieval consistently outperforms mono-modal alternatives (Tang et al., 8 Jul 2025, Chen et al., 31 Jul 2025).
Hierarchical Team Acceleration: 360°REA’s dual-level experience pools drive continual performance increases across creative and planning tasks; completion and match rates rise with each assessment cycle (Gao et al., 8 Apr 2024).
Scaling Law of Experiential Growth: FLEX empirically establishes a power-law scaling relationship between library size and accuracy, with experience library size growing sigmoidally with deployment epoch (Cai et al., 9 Nov 2025).

These results unequivocally validate the premises of experience inheritance as fundamental mechanisms for agentic self-evolution and generalization.

5. Challenges, Limitations, and Future Directions

While experience inheritance offers substantial gains, several challenges persist.

Experience Selection and Overfitting: Indiscriminate sharing can degrade performance via off-policy variance or knowledge interference, necessitating careful scoring and disagreement gates (Gerstgrasser et al., 2023, Tang et al., 8 Jul 2025).
Scalability and Memory Management: Large experience banks require efficient hierarchical indexing, context clustering, and adaptive evictions to maintain retrieval speed and quality (Qian et al., 25 Jan 2024, Cai et al., 9 Nov 2025).
Task and Agent Heterogeneity: Methods such as focused ES and context-directed relaying depend on agent homogeneity; extending these approaches to highly heterogeneous populations remains an open problem (Souza et al., 2019, Garant et al., 2017).
Robustness to Distributional Shift: CoPS demonstrates theoretically that inheritance gain depends on matching between agent’s trial distributions and pretrained decoder output; pessimism-based selection is key to safe generalization (Yang et al., 22 Oct 2024).
Integration with Self-Evolution and Failure Learning: The ICE framework notes that failed trajectories are often neglected; future directions include robust consolidation of negative experience (Qian et al., 25 Jan 2024).
Transparency and Audibility: Modular experiences such as FLEX’s libraries are fully auditable and human-readable; ensuring all agent frameworks provide clarity for inspection is a central design goal (Cai et al., 9 Nov 2025).

A plausible implication is that experience inheritance paradigms, when equipped with principled scoring, similarity matching, and adaptive memory management, provide core foundations for the scalable evolution of autonomous multi-agent intelligence.

6. Historical Progression and Theoretical Foundations

Experience inheritance across agents has evolved from classic shared replay buffers and supervisor-directed relaying (Garant et al., 2017), through prioritized experience sharing in decentralized RL contexts (Souza et al., 2019, Gerstgrasser et al., 2023), to more sophisticated graph-based memories, compositional skill libraries, automata-driven workflow reuse, and collective cross-framework knowledge bases (Xia et al., 11 Nov 2025, Du et al., 23 May 2025, Tang et al., 8 Jul 2025). Theoretical guarantees in distribution-matched selection and regret bounds substantiate the algorithmic foundations for safe and efficient inheritance (Yang et al., 22 Oct 2024). Recent lines foreground inheritance as a central, plug-and-play architectural module, decoupled from backbone model training and enabling transparent, auditable learning ecosystems (Cai et al., 9 Nov 2025).

The field is converging on experience inheritance as a critical driver for intelligent agent deployment at scale, with ongoing work focusing on dynamic adaptability, robust cross-framework integration, and generalization-preserving selection criteria.