- The paper introduces a novel two-layer protocol that decouples resource management from evolutionary control, enabling safe and modular agent self-improvement.
- It defines explicit lifecycle policies, versioned interfaces, and a closed-loop operator algebra (Reflect, Select, Improve, Evaluate, Commit) to drive systematic self-evolution.
- Empirical evaluations show significant performance gains on benchmarks, with improved problem-solving, tool integration, and reduced error rates.
Autogenesis: A Protocol Architecture for Self-Evolving Agentic Systems
Contemporary LLM-based agent systems have expanded the feasible scope of autonomous tool use and multi-step reasoning. However, their adaptation to dynamic environments remains bounded by protocol and tooling limitations. Existing interoperability protocols, notably Model Context Protocol (MCP) and Agent-to-Agent (A2A), focus on connectivity and messaging, but lack primitives for explicit resource lifecycle, version management, and safe, auditable self-modification. This absence results in agent architectures with brittle, monolithic compositions and restricts sustainable, composable self-evolution. The key technical gap is the lack of a protocol-level abstraction that decouples the “what” (resources to be evolved) from the “how” (evolutionary optimization logic and control flow).
Protocol Architecture: AGP
Autogenesis introduces a two-layer protocol stack—Autogenesis Protocol (AGP)—comprising (1) a Resource Substrate Protocol Layer (RSPL) and (2) a Self-Evolution Protocol Layer (SEPL). RSPL treats prompts, agents, tools, environments, and memory as protocol-registered resources with explicit state, lifecycle controls, and versioned interfaces. SEPL establishes an operator algebra that defines a closed-loop control process for self-evolution, composed of atomic reflect, select, improve, evaluate, and commit operators. Every resource modification is versioned and auditable, and every evolution cycle is grounded in operational trace data with rollback support.
Key architectural innovations include:
- Complete abstraction and registration of all agent components (prompt, agent, tool, environment, memory) external to the agent core, supporting both modular interoperation and safe dynamic refinement.
- Explicit context managers and server interfaces for each resource type, exposing lifecycle, retrieval, evaluation, update, and revert operators, and maintaining versioned artifact lineage.
- A general infrastructure layer (version manager, model manager, dynamic manager, tracer) supporting reproducibility, rollback, hot-swapping, and transparent auditing.
Operator Algebra and Evolution Loop
AGP’s SEPL layer models self-evolution as an optimization problem over a heterogeneous, explicitly trackable state space. The operator sequence—Reflect, Select, Improve, Evaluate, Commit—constitutes the atomic stages of self-evolution:
- Reflect: Diagnoses causal failure modes given operational traces, yielding natural language or formal hypotheses for improvement direction.
- Select: Translates hypotheses into explicit candidate modifications over evolvable variables (either prompts, code, tool schema, etc.).
- Improve: Applies candidate mutations to resource states via RSPL, yielding a provisional candidate configuration.
- Evaluate: Examines candidate performance under predefined objectives, measuring quantitative metrics and safety invariants.
- Commit: Conditionally admits or rejects candidate updates, ensuring only strictly beneficial, safe modifications propagate, with all changes versioned.
This closed-loop cycle is agnostic to the underlying optimization paradigm, supporting reflection-driven, gradient-based (TextGrad), or RL-based (Reinforce++, GRPO) strategies, all funneled through unified resource manipulation and evaluation interfaces (2604.15034).
Systems Realization: AGS
The Autogenesis System (AGS) instantiates AGP in a multi-agent framework organized around a shared message bus. Orchestration agents decompose tasks and manage concurrent execution by specialized sub-agents, each retrieving and manipulating protocol-registered resources as required. Sub-agents can themselves be composed as tools within standardized schemas, supporting both agent-as-tool and loosely coupled multi-agent protocols. At every step, AGS interleaves task execution with SEPL-driven self-evolution triggered by observed errors or suboptimality, fundamentally integrating the improvement mechanism into agent lifetime operation.
Empirical Evaluation
Scientific and Mathematical Reasoning (GPQA, AIME)
Across GPQA and AIME benchmarks, AGS delivers significant and consistent gains over strong vanilla LLM baselines, especially for models with non-saturating performance. For weaker models, relative gains reach 71% in AIME24 and 100% in AIME25 under the combined prompt+solution evolution strategy. For higher-performing models, ceiling effects limit further improvement, but gains remain non-negligible (e.g., 12% gain on AIME benchmarks for gemini-3-flash-preview). Combined evolution (prompt plus solution) strictly dominates single-strategy variants in all cases.
The results support the claim that protocolized, closed-loop evolution is substantially more effective than ad hoc or static architectures, with the effect size modulated by available headroom and benchmark structure.
Generalized Agent Benchmark (GAIA)
On GAIA, with tool-focused tasks, AGS—employing tool evolution via the protocolized loop—achieves a new state-of-the-art Pass@1 (89.04%) and excels particularly at the hardest task tier (Level 3: 81.63%, >12 points above vanilla). Hierarchical resource management and dynamic tool refinement through SEPL operators allow AGS to mitigate planning complexity and create runtime-adapted, context-specific tools without fragile glue logic, supporting high task generalization and robustness.
Algorithmic Coding Benchmark (LeetCode)
On a multi-language LeetCode benchmark, self-evolving agents attain 10–26% relative pass-rate improvements, with the greatest gains in compiled languages. Coding solutions generated via in-loop evolution consistently improve runtime efficiency and often become more competitive relative to the human solution distribution. Error rates decrease across compile-time and runtime failure modes, and sustained, compounding improvement is observed over inference trajectories.
Implications and Future Directions
The shift to protocol-level modularization and evolution—decoupling evolutionary substrate from optimization logic—represents a significant elevation in agentic system design. This approach enables:
- Systematic context and state engineering; prompts, tools, and capabilities become composable, inspectable, and transferable artifacts with versioned governance.
- Automated, auditable agent self-repair and adaptation in response to environmental feedback, supporting long-term robustness and autonomy in heterogeneous environments.
- Compositional integration of optimization techniques (reflection, RL, gradient-based) under a single formal umbrella, maximizing the leverage of recent advances in agent learning.
The work highlights that closed-loop, protocol-driven self-evolution is crucial for escaping the scaling limits of static, monolithic pipelines. Future developments may include:
- Expanded entity type schemas to cover richer agent modalities (e.g., multimodal sensors, distributed actor memory).
- Ecosystem-level resource marketplaces, where evolved prompts, tools, and plans become exchangeable assets.
- Further integration with fine-grained safety, alignment, and interpretability modules through the standardized protocol interfaces.
Conclusion
Autogenesis offers a principled and extensible protocol for building modular, traceable, and self-improving agentic systems. By elevating evolution to the protocol layer and enforcing separation of resource management from optimization logic, it establishes a procedural and architectural foundation for robust, scalable autonomous agents (2604.15034). This protocol-centric paradigm is positioned to become a critical infrastructure for next-generation agent research, development, and deployment.