Stable Agentic Control: Tool-Mediated LLM Architecture for Autonomous Cyber Defense

Published 4 May 2026 in cs.AI, cs.CR, and eess.SY | (2605.03034v1)

Abstract: Agentic systems involved in high-stake decision-making under adversarial pressure need formal guarantees not offered by existing approaches. Motivated by the operational needs of security operations centers (SOCs) that must configure endpoint detection and response (EDR) policies under adversarial pressure, we present a tool-mediated architecture: LLM agents use deterministic tools (Stackelberg best-response, Bayesian observer updates, attack-graph primitives) and select from finite action catalogs enforced at the tool-output interface. A composite Lyapunov function machine-checked in Lean 4 with zero sorry certifies controllability, observability from asymmetric sensor data, and Input-to-State Stability (ISS) robustness under intelligent adversarial disturbance, with two corollaries extending the certificate to any controller or adversary from the catalogs. On 282 real enterprise attack graphs, the claims hold with margin. On paired offensive/defensive telemetry, a tool-mediated Claude Sonnet 4 controller reduces the attacker's expected payoff (game value) by 59% relative to a deterministic greedy baseline, with zero variance across 40 runs at four temperatures. A Claude Haiku 4.5 controller converges to suboptimal game values but stays catalog-bounded over an additional 40 runs, demonstrating that architectural stability is not dependent on the controller capability. The LLM agent's non-determinism furthers creative exploration of strategies, while the tool-mediated architecture ensures system stability.

Abstract PDF Upgrade to Chat

Authors (8)

Summary

The paper introduces a tool-mediated LLM architecture that formally guarantees closed-loop controllability and stability even under adversarial conditions.
It employs deterministic primitives such as Bayesian belief updates and attack-graph computations to mediate LLM reasoning, ensuring predictable and bounded outcomes.
Empirical validation across enterprise attack graphs demonstrates 100% monotonicity, ISS robustness, and significant reduction in defender uncertainty.

Stable Agentic Control: A Tool-Mediated LLM Architecture for Autonomous Cyber Defense

Problem Formulation and Architectural Innovation

The paper "Stable Agentic Control: Tool-Mediated LLM Architecture for Autonomous Cyber Defense" (2605.03034) introduces a formal architecture for closed-loop, LLM-driven cyber defense under adversarial pressure with machine-checked stability guarantees. The central contribution is an agentic system where LLMs act as controllers for enterprise cybersecurity operations, mediated by deterministic tools that enforce catalog constraints on both defender and attacker action spaces. In this architecture, the LLM agent leverages (but does not directly execute) deterministic primitives—including Stackelberg best-response solvers, Bayesian belief-updating observers, and attack-graph computations—to select actions from finite, catalog-enforced sets.

This tool mediation fundamentally alters the agentic loop. The architecture explicitly separates stochastic LLM reasoning from plant transitions and game dynamics, allowing the system to bound non-determinism, control catalog exhaustiveness, and render the stability of outcomes a property of the system structure—independent of agent implementation details or LLM backbone.

Formal Results: Lean 4-Verified Stability Guarantees

At the theoretical core, the paper introduces a composite Lyapunov function $V(k) = S(k) + \lambda \theta(k)$ , where $S(k)$ is the attacker's expected game value (network interdiction payoff reflecting maximum attacker benefit after defense) and $\theta(k)$ is the mean uncertainty over defender belief about attack graph edges. This Lyapunov structure provides the vehicle for formal verification of closed-loop controllability, ISS robustness against adaptive attacks, and observer convergence.

All proofs are machine-checked in Lean 4 with zero “sorry,” instantiating strong guarantees:

Controllability: Under no adversarial disturbance, every defender policy deployment yields a monotone decrease in $S(k)$ and contracts the Lyapunov function.
Input-to-State Stability (ISS): Under best-responding, intelligent adversarial disturbance, $V(k)$ remains ISS-bounded; defense actions and Bayesian updates compensate for any attacker graph expansion within finite catalogs, with margins directly depending on catalog parameters, budget, and observer contraction rate.
Observability: The estimator contracts edge-level uncertainties geometrically as evidence accumulates, ensuring practical system identification even under asymmetric and partial observation scenarios.
Generalization: These guarantees hold controller- and adversary-agnostically—any agentic controller and adversary confined to their respective catalogs inherit the same formal system-level bounds.

Empirical Validation on Real-World Cyber Graphs

Experimental results operationalize the formal framework across two axes:

1. Architectural Properties (Claims i–iii) on 282 Enterprise Attack Graphs

Evaluation on 282 industry-scale attack graphs, derived from real pentesting data spanning 161 enterprise organizations and 25 sectors, demonstrates:

Strict controllability: In all cases, every defender action strictly reduces $S(k)$ or leaves it unchanged (monotonicity pass rate: 100%, $n=282$ ).
ISS robustness: Under adversarial graph expansion, the system never violates the formal disturbance bounds; maximal single-round worst-case increases of $S(k)$ are well below the theoretical limit, with substantial empirical safety margin (e.g., 0.74 observed, 1.0 bound).
Observability acceleration: Adversarial pressure paradoxically tightens defender belief-truth alignment; belief-to-ground-truth gaps decay geometrically, and adversary-induced reveals decrease residual estimation error by $4.7\times$ over no-disturbance conditions.
Figure 1: Experiment 1 results on 282 graphs: (a) defender-only monotone plant trajectory; (b) bounded ISS gain under adversarial disturbance; (c) $4.7\times$ reduction in belief-truth value gap due to adversarial reveals.

2. Generality with LLM Controllers (Corollaries 1–2) on Paired Telemetry

On the GOAD Active Directory environment, the system was instantiated with Claude Sonnet~4 and Claude Haiku~4.5 LLMs as controllers:

Sonnet~4 (40 runs): All runs converged to $S(k)$ 0 (zero variance across all temperatures), representing a $S(k)$ 1 reduction in attacker game value compared to a greedy deterministic baseline ( $S(k)$ 2). Action-level diversity (Jaccard $S(k)$ 3– $S(k)$ 4) did not translate to outcome variability, demonstrating architectural decoupling of exploration and stability.
Haiku~4.5 (40 runs): The same tool-mediated architecture yielded capability-bound stability: all runs stayed within the catalog-based bounds, but only $S(k)$ 5 reached the optimal $S(k)$ 6 (matching Sonnet), while $S(k)$ 7 plateaued at suboptimal values typical of the greedy baseline. All off-catalog action proposals were rejected, reinforcing the guarantee that catalog and system, not the backbone, determine safety and stability.
Figure 2: Within-family scaling: Sonnet~4 achieves zero outcome variance, while Haiku~4.5 converges to suboptimal $S(k)$ 8 in roughly half of runs. Both controllers remain catalog-bounded.

Technical and Operational Implications

This work makes several important claims, each demonstrated with statistical rigor and empirical completeness:

System-level rather than agent-level guarantees: All stability and observability properties are enforced by the closed-loop structure and actuator interface; they do not depend on the reasoning logic or backbone capability of the LLM. This shifts the unit of safety from component-level (agent) to architecture-level (system).
Non-determinism is strictly contained: LLM stochasticity is leveraged for search and exploration but cannot violate catalog boundaries or destabilize outcomes; outcome variance observed in vanilla LLM agents is absent under this architecture.
Adversarial pressure is informative: Paradoxically, adversarial expansion accelerates defender estimation convergence, functioning as an implicit informant by triggering observations otherwise inaccessible.
Certification is dual-use: The same formal bounds apply to attackers as to defenders; a malevolent agent confined to the same actionable catalog cannot destabilize the system or evade the disturbance envelope.
Reasoning depth and integration: Empirical evidence (Sonnet vs. Haiku) shows that architectural safety (no off-catalog action, ISS bounds) does not equate to optimality; poor integration of observer evidence (as in Haiku) can yield suboptimal system-level defense, motivating runtime monitoring of the belief-truth gap as a diagnostic.
Scalability and reproducibility: All guarantees and results hold without training or adaptation—convergence occurs in a single analysis cycle, and the Lean 4 verification is reproducible.

Future Directions

The authors identify several extension points of significant research interest:

Relaxing action monotonicity and rollback: Current (A4) persistence assumptions can be weakened to admit more dynamic catalogs or reversible actions, broadening applicability to settings with rolling policy windows.
Dynamic catalog expansion and open-world action sets: Extending the formal ISS and controllability bounds to allow for incremental catalog augmentation under systematic validation procedures.
Generalization domains: The formalism applies wherever agentic systems operate under adversarial pressure and catalog-bounded actions (e.g., financial compliance, physical security, safety-critical robotics).

Conclusion

The paper presents a complete architectural and verification framework for LLM-mediated cyber defense, elevating system-level safety and robustness above agent-specific guarantees. The Lyapunov-based, Lean 4-checked approach ensures that closed-loop controllability, ISS robustness, and observability are provably maintained even when agents are non-deterministic, adversarial, or of variable reasoning capability. The empirical margin observed across real-world security graphs, with precise statistical treatment, supports the architecture’s deployment in operational environments demanding auditable guarantee envelopes. The approach generalizes to any setting with finite-catalog, tool-mediated agentic control under adversarial disturbance, marking a paradigm shift in certifiable agentic system design.

Markdown Report Issue