Papers
Topics
Authors
Recent
Search
2000 character limit reached

Autogenesis System (AGS): Self-Evolving Agents

Updated 19 April 2026
  • Autogenesis System (AGS) is a protocol-driven multi-agent architecture that separates resource evolution from upgrade mechanisms for robust adaptability.
  • It employs a layered design with the Resource Substrate Protocol Layer (RSPL) and Self-Evolution Protocol Layer (SEPL) to enable systematic, auditable, and iterative improvements.
  • The system demonstrates enhanced task accuracy, efficiency, and dynamic error management by leveraging closed-loop, data-driven update cycles across interconnected agents.

The Autogenesis System (AGS) is a protocol-driven, self-evolving multi-agent architecture that formalizes dynamic resource management and closed-loop improvement across LLM-based agent systems. AGS, as realized in both "AutoGenesisAgent: Self-Generating Multi-Agent Systems for Complex Tasks" (Harper, 2024) and "Autogenesis: A Self-Evolving Agent Protocol" (Zhang, 16 Apr 2026), decouples the entities subject to evolution (prompts, agents, tools, environments, memories) from the mechanisms governing their lifecycle, versioning, and upgrade. AGS systems are distinguished by a layered design: (1) the Resource Substrate Protocol Layer (RSPL) defines and manages typed, versioned resources with machine- and human-usable metadata; (2) the Self-Evolution Protocol Layer (SEPL) implements a closed-loop operator algebra enabling systematic propose, assess, and commit cycles. This yields agents that adapt not only their behavioral policies but also the underlying components, enabling robust handling of long-horizon, tool-augmented tasks and facilitating empirical gains over monolithic or static agent architectures.

1. Protocol Abstraction and Motivations

Recent research highlights brittle adaptation and lack of interface standards as key failure modes in LLM-based agentic systems: manually wired agent graphs, monolithic code, and ad hoc self-modification obstruct systematic improvement and yield runtime instability (Zhang, 16 Apr 2026). AGS addresses these by separating "what evolves"—a suite of typed resources—from "how evolution occurs," instantiated as a closed formulaic protocol. The RSPL formalizes all fundamental entities as versioned resources with explicit state and contract representations, while the SEPL enforces iterative, auditable improvement cycles based on execution traces.

2. Resource Substrate Protocol Layer (RSPL)

RSPL models each resource type T∈{PROMPT,AGENT,TOOL,ENV,MEM}T \in \{\text{PROMPT}, \text{AGENT}, \text{TOOL}, \text{ENV}, \text{MEM}\} as a tuple

CT,i=(nT,i,dT,i,OT,i,gT,i,MT,i)C_{T,i} = (n_{T,i}, d_{T,i}, O_{T,i}, g_{T,i}, M_{T,i})

comprising a unique name, description, input-output mapping, trainability flag, and metadata dictionary (Zhang, 16 Apr 2026). Registration records extend this with semantic versioning, an implementation descriptor, instantiation parameters, and machine-usable schemas. Every lifecycle operation (init, register, retrieve, update, restore, run, etc.) is mediated by a context manager MTM_T and surfaced over uniform RPC/HTTP servers, facilitating decoupled and hot-swappable access. Cross-cutting infrastructure modules ensure version lineage (diff, rollback, branching, dynamic update), global resource registry R=⋃TRTR = \bigcup_T R_T, and trace logging for retrospective analysis (Zhang, 16 Apr 2026).

3. Self-Evolution Protocol Layer (SEPL)

SEPL defines the space of evolvable variables,

Vevo=⋃T∈{PROMPT, AGENT, TOOL, ENV, MEM}ET∪{execution artifacts},V_{evo} = \bigcup_{T\in\{\text{PROMPT, AGENT, TOOL, ENV, MEM}\}} E_T \cup \{\text{execution artifacts}\},

where each v∈Vevov \in V_{evo} has a learnability mask δv∈{0,1}\delta_v \in \{0,1\}, yielding a trainable subspace Θ={v∣δv=1}\Theta = \{v \mid \delta_v = 1\} (Zhang, 16 Apr 2026). SEPL's operator algebra consists of:

  • Reflect p:Z×Vevo→Hp: Z \times V_{evo} \to H: Infers failure hypotheses from execution traces.
  • Select o:Vevo×H→Do: V_{evo} \times H \to D: Generates concrete update proposals.
  • Improve CT,i=(nT,i,dT,i,OT,i,gT,i,MT,i)C_{T,i} = (n_{T,i}, d_{T,i}, O_{T,i}, g_{T,i}, M_{T,i})0: Applies modifications using RSPL operators.
  • Evaluate CT,i=(nT,i,dT,i,OT,i,gT,i,MT,i)C_{T,i} = (n_{T,i}, d_{T,i}, O_{T,i}, g_{T,i}, M_{T,i})1: Quantifies metrics and evaluates post-modification safety.
  • Commit CT,i=(nT,i,dT,i,OT,i,gT,i,MT,i)C_{T,i} = (n_{T,i}, d_{T,i}, O_{T,i}, g_{T,i}, M_{T,i})2: Accepts or rolls back changes, with full auditability.

A canonical loop (Algorithm 1, (Zhang, 16 Apr 2026)) successively reflects, proposes, improves, evaluates, and commits improvements, reverting on gaps, until convergence or budget exhaustion. All state transitions pass through RSPL, ensuring auditable version trails and robust rollback.

4. Multi-Agent System Architecture and Data Flows

Autogenesis System architectures instantiate these protocols as orchestrated ensembles of meta-agents or sub-agents managed via a shared agent bus (Harper, 2024, Zhang, 16 Apr 2026). For example, "AutoGenesisAgent" defines distinct agents (System Understanding, System Design, Agent Generator, Integration & Testing, Optimization & Tuning, Deployment, Documentation & Training, Feedback & Iteration, LLM Prompt Design, Hierarchy Agent) each with single-responsibility functions mapped to lifecycle stages. All agent artifacts (code stubs, prompts, configurations) are generated, registered, and version-tracked through the RSPL substrate (Harper, 2024).

Internal representations progress from user prompt → JSON schema (specification) → adjacency-list/GraphML blueprints (architectural graphs) → containerized code modules and integration harnesses, with all inter-agent communication coordinated via a central asynchronous event bus (Python asyncio + Redis pub/sub) (Harper, 2024). Sub-agents retrieve required tools, prompts, or memories by semantic search over the RSPL registry, write execution and reasoning traces, and trigger SEPL updates where failure or suboptimality is detected (Zhang, 16 Apr 2026).

5. Closed-Loop Iterative Refinement and Optimization

Both AGS instantiations employ a generate–evaluate–refine loop. A high-level pseudocode for AutoGenesisAgent is: CT,i=(nT,i,dT,i,OT,i,gT,i,MT,i)C_{T,i} = (n_{T,i}, d_{T,i}, O_{T,i}, g_{T,i}, M_{T,i})5 (Harper, 2024)

Optimization is formalized as

CT,i=(nT,i,dT,i,OT,i,gT,i,MT,i)C_{T,i} = (n_{T,i}, d_{T,i}, O_{T,i}, g_{T,i}, M_{T,i})3

where CT,i=(nT,i,dT,i,OT,i,gT,i,MT,i)C_{T,i} = (n_{T,i}, d_{T,i}, O_{T,i}, g_{T,i}, M_{T,i})4 denotes tunable parameters (prompt templates, thresholds, allocations), and key metrics are defined as the fraction of correct task completions, mean inter-agent latency, and normalized resource usage (Harper, 2024). Iterative improvement leverages both heuristic search (e.g., Bayesian optimization) and human-simulated or actual feedback from production logs.

6. Empirical Validation and Comparative Results

AGS was benchmarked across science/math reasoning (GPQA-Diamond, AIME), the GAIA general agent task suite, and LeetCode coding problems (Zhang, 16 Apr 2026). Empirical results show:

  • GPQA/AIME: On AIME24, gpt-4.1 backbone improved exact-match accuracy 23.3%→40.0% over three SEPL-driven rounds. Prompt+solution evolution consistently outperformed single-strategy adaptation.
  • GAIA: AGS with tool evolution improved Pass@1 average from 79.07% (vanilla) to 89.04%, with hardest tier gain from 61.22%→81.63%, outperforming ToolOrchestra and HALO.
  • LeetCode: Across all major languages, full-pass counts increased (C++ 82→99, Python3 79→87), error rates dropped sharply, and mean runtimes and memory usage decreased, indicating that iterative evolution yields both correctness and efficiency improvements.

"AutoGenesisAgent" (Harper, 2024) reports generation of multi-agent systems for educational, software development, project management, and healthcare workflows with task accuracy ranging 0.6–0.9, mean inter-agent latency 1.2–2.5s, and convergence typically within 3–6 cycles. Healthcare scenario failures exposed the limitation of the current safety and error-handling models.

7. Current Limitations and Observed Shortcomings

Several limitations are reported:

  • Conversational Deadlocks: Lack of a dedicated Conversation Management Agent led to unresolved loops; partial mitigation was achieved with timeouts, but further specialization is needed.
  • Error Handling: Unexpected payload schemas caused crashes; robust error-handling agents remain a future requirement.
  • Scalability: The event bus became a bottleneck, indicating the need for sharded or topic-based messaging and distributed tracing.
  • Security and Compliance: No current enforcement of encryption/access control; future work targets the addition of specialized security agents.
  • Generalization: All LLMs were used without fine-tuning, and pipeline adaptation is proposed for increased robustness.
  • Productionization: AGS prototypes, to date, are confined to laboratory deployments; production releases necessitate container hardening and automated deployment logic (Harper, 2024).

8. Structural Properties Underlying Performance Gains

AGS consistently improves performance on long-horizon, tool-augmented tasks due to (1) strong resource modularity and versioned contract control (RSPL); (2) closed-loop operator algebra (SEPL) that ensures all updates are data-driven, auditable, and rollback-safe; (3) multi-agent bus coordination enabling dynamic, parallel planning and recovery; and (4) persistent resources supporting cross-round knowledge transfer without brittle hardcoded logic. This layered, protocolized approach stands in contrast to monolithic agent pipelines and facilitates measurable, statistically significant improvements across reasoning, planning, tool use, and code generation (Zhang, 16 Apr 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Autogenesis System (AGS).