Resource Substrate Protocol Layer (RSPL)
- Resource Substrate Protocol Layer (RSPL) is a formal abstraction that defines and manages key resource types such as PROMPT, AGENT, TOOL, ENV, and MEM within AGS.
- RSPL employs a uniform RPC/HTTP interface to ensure consistent registration, version control, and dynamic reconfiguration of agentic resources.
- RSPL integrates with the Self-Evolution Protocol Layer (SEPL) to facilitate closed-loop self-improvement and enhance multi-agent coordination across complex tasks.
Autogenesis System (AGS) is a self-evolving multi-agent framework with formalized protocols for resource management, closed-loop self-improvement, and compositional agent architecture. The foundational abstraction separates agent substrate (prompts, agent definitions, tools, environments, memory) from evolution mechanisms, enabling robust, versioned, and auditable lifecycle operations. AGS is instantiated atop the Autogenesis Protocol (AGP), which specifies both how evolutionary updates are orchestrated (via a Self-Evolution Protocol Layer, SEPL) and how agentic resources are abstracted (via a Resource Substrate Protocol Layer, RSPL). AGS demonstrates marked improvements over baseline agent systems on benchmarks requiring long-horizon planning, multi-agent coordination, and dynamic tool use (Harper, 2024, Zhang, 16 Apr 2026).
1. Motivation and Core Objectives
AGS addresses critical bottlenecks in LLM-based agent systems, which are typically brittle in the face of novel tasks and lack formal support for component evolution, version management, and safe updates. Conventional protocols (e.g., A2A, MCP) omit explicit lifecycle tracking, resulting in monolithic compositions susceptible to runtime error and instability. AGS introduces decoupling between “what evolves” (resource level: prompts, agents, tools, envs, memory) and “how evolution occurs” (protocol level: closed-loop reflection, proposal, commitment) (Zhang, 16 Apr 2026).
2. Autogenesis Protocol: RSPL and SEPL
2.1 Resource Substrate Protocol Layer (RSPL)
RSPL abstracts five core resource types: PROMPT, AGENT, TOOL, ENV, and MEM. Each instance of a resource type is a tuple
where is a unique name, a short description, the input-output mapping, the trainable flag, and the metadata dictionary.
Registration records additionally include version strings and implementation descriptors , with all versions managed in a type-indexed registry. Lifecycle and server operations include: init, register, unregister, list, get, retrieve (e.g., semantic search), update, restore, copy, run, and serialization calls (save/load to JSON). A uniform RPC/HTTP server interface isolates clients from substrate changes.
Supporting services include a model manager (for LLM routing/cost), a version manager (diffs, rollback, branching), a dynamic manager (hot-swap), and a tracer module to capture execution traces.
2.2 Self-Evolution Protocol Layer (SEPL)
SEPL casts resource self-improvement as an iterative closed-loop operator algebra over a unified evolvable variable set:
0
where each 1 is associated with a learnability mask 2; the trainable subspace is 3.
SEPL defines five typed operators: 4 All state changes are made through RSPL, guaranteeing auditability and version lineage. The SEPL loop (Algorithm 1 in (Zhang, 16 Apr 2026)) orchestrates reflection, proposal, mutation, evaluation, and gating phases until convergence or resource budget exhaustion.
3. AGS System Architecture and Multi-Agent Flow
AGS realizes AGP as a multi-agent orchestration with dynamic resource retrieval, parallel agent subtasking, and evolution integration. The architecture includes three principal elements:
- Orchestrator Agent: Decomposes tasks into subtasks, plans via a versioned document (e.g., plan.md in RSPL), and broadcasts to sub-agents over a shared Agent Bus.
- Sub-Agents: Specialized workers (e.g., deep researcher, browser-use, tool-calling, tool generator) that fetch prompts and tools from RSPL (either by name or semantically), perform their operation, and store outputs and reasoning traces to memory resources.
- SEPL Evolution Loop: Whenever suboptimal outcomes or errors are detected (via trace analysis), the SEPL loop is invoked to reflect, propose, and (if accepted by evaluation and gating) commit updates—yielding new versions of relevant prompts, tools, or even the plan itself.
The Agent Bus enables fine-grained composition, parallelism, and isolation: planning decisions flow downward, while data, results, and error traces feed upward. Hot-swappability (runtime resource replacement) and semantic registry search decouple agent roles from fixed toolkits, enabling emergent specialization and adaptation (Zhang, 16 Apr 2026, Harper, 2024).
4. Algorithmic and Deployment Workflows
The application of AGS is a “generate-evaluate-refine” cycle that transforms user intent into operational multi-agent systems:
- Parsing Requirements: System Understanding Agent parses the user’s prompt, extracting task structures, constraints, and stakeholder patterns into a JSON schema.
- Design and Generation: System Design Agent synthesizes a directed graph (nodes = agent types, edges = communication) and produces both UML diagrams and code blueprints.
- Agent Instantiation: Agent Generator creates concrete stub artifacts, interface definitions, and container specs for each agent type.
- Integration and Testing: Integration and Testing Agent assembles all modules, executes both unit and integration tests, and flags protocol or interface violations.
- Optimization and Tuning: Performance metrics (e.g., latency, accuracy, resource use) are evaluated, with Bayesian optimization or heuristic search applied to prompt, threshold, and configuration parameters.
The formal optimization objective is:
5
with 6 being the tunable system parameters.
- Deployment: Agent modules are packaged into Docker images, deployed via Kubernetes manifests, and subjected to CI/CD pipelines with shadow and staging rollouts.
- Documentation and Training: Documentation and Training Agent synthesizes guides, API docs, and tutorials.
- Feedback and Iteration: Production logs, user feedback, and KPIs are collected; these trigger evolution cycles via updated prompts or direct SEPL invocation (Harper, 2024).
5. Internal Data Structures and Knowledge Propagation
Resource specifications, agent graphs, and execution traces form the internal representational backbone:
- Specification Schema: JSON definitions of task decompositions and constraints
- Blueprint Graphs: GraphML or adjacency-list representations of agent roles and data flows
- Resource Records: Versioned tuples 7 express logical and operational resource identities
- Testing Harness: Integrated Docker/pytest stacks for simulation and verification
- Time-Series Logs: PostgreSQL stores for optimization agent, enabling surrogate model fitting for parameter tuning
- Memory Artifacts: Persistent storage of traces, outcomes, and prompt histories, accessible to all agents via the registry
Knowledge transfer across agent lifecycle stages is facilitated by semantic search, versioned registry queries, and context manager contracts, enabling both horizontal (across agents) and vertical (across iterations) information flow (Harper, 2024, Zhang, 16 Apr 2026).
6. Empirical Evaluation
AGS has been validated on diverse benchmarks including graduate-level scientific QA (GPQA-Diamond), symbolic math (AIME 24/25), general agentic tasks (GAIA), and LeetCode algorithmic coding challenges. Key findings include:
- Science and Math (GPQA/AIME): Prompt+solution evolution consistently yields the highest exact-match accuracy. Improvement is especially pronounced for weaker LLMs, e.g., 23.3% → 40.0% (+71.4% relative) for gpt-4.1 on AIME24 (Zhang, 16 Apr 2026).
- GAIA Benchmark: AGS with dynamic tool evolution achieves 89.04% average completion (vanilla 79.07%), with gains of 20%+ on hardest task tiers, outperforming strong baselines such as ToolOrchestra (87.38%) (Zhang, 16 Apr 2026).
- LeetCode Coding: Significant increases in problem resolution rates, e.g., C++ pass count rises from 82 to 99; errors (e.g., compile/runtime) drop sharply. Runtime and memory usage improvements yield “human-beating” rates for solutions in C++, Java, and Go (Zhang, 16 Apr 2026).
- General Automated System Generation: Median generation times range from 15–22 minutes across domains (educational content, project management, simple dev pipelines), with task completion accuracy spanning 0.6–0.9 and iteration convergence in 3–6 design cycles (Harper, 2024).
| Benchmark/Domain | Metric | AGS Outcome |
|---|---|---|
| AIME24 (gpt-4.1) | Accuracy | 23.3% → 40.0% |
| GAIA (tool evolution) | Pass@1 (Level 3) | 61.22% → 81.63% |
| LeetCode (C++ pass) | Problem resolution | 82 → 99 |
| Educational content | Generation time | ~15 min, 90% accuracy |
| Project management | Plan quality | 4/5 (SME-rated) |
See (Harper, 2024, Zhang, 16 Apr 2026).
7. Limitations and Prospective Directions
AGS exposes several challenges:
- Conversational Stalling: Absence of dedicated conversation management yields risk of deadlocks from repeated queries (Harper, 2024). Dedicated agents and improved loop detection are needed.
- Error Handling: Payload schema variation causes early crashes. Specialized error recovery and rerouting agents are required.
- Scalability: Monolithic event bus approaches bottleneck under load; scalable, sharded, or topic-focused pub/sub topologies are necessary (Harper, 2024).
- Security and Compliance: No inbuilt enforcement of encryption or access control. A Security and Compliance Agent is marked for future enhancement.
- Adaptivity: Out-of-the-box LLMs without fine-tuning limit adaptability; future directions include learning agents capable of prompt/model adaptation during runtime.
- Deployment Robustness: Existing instantiations are laboratory-only; production deployment necessitates hardened containers, network policy enforcement, and automation for rolling updates.
A plausible implication is that AGS’s context-managed, versioned, and hot-swappable resource model—combined with formalized closed-loop evolution—constitutes a distinct paradigm shift for LLM-agent compositionality, benchmarking, and safe deployment in complex, long-horizon domains (Harper, 2024, Zhang, 16 Apr 2026).