From Admission to Invariants: Measuring Deviation in Delegated Agent Systems

Published 19 Apr 2026 in cs.AI and cs.CR | (2604.17517v2)

Abstract: Autonomous agent systems are governed by enforcement mechanisms that flag hard constraint violations at runtime. The Agent Control Protocol identifies a structural limit of such systems: a correctly-functioning enforcement engine can enter a regime in which behavioral drift is invisible to it, because the enforcement signal operates below the layer where deviation is measurable. We show that enforcement-based governance is structurally unable to determine whether an agent behavior remains within the admissible behavior space A0 established at admission time. Our central result, the Non-Identifiability Theorem, proves that A0 is not in the sigma-algebra generated by the enforcement signal g under the Local Observability Assumption, which every practical enforcement system satisfies. The impossibility arises from a fundamental mismatch: g evaluates actions locally against a point-wise rule set, while A0 encodes global, trajectory-level behavioral properties set at admission time. An agent can therefore drift -- systematically shifting its behavioral distribution away from admission-time expectations -- while every individual action remains within the permitted action space. We define the Invariant Measurement Layer (IML), which bypasses this limitation by retaining direct access to the generative model of A0, restoring observability precisely in the region where enforcement is structurally blind. We prove an information-theoretic impossibility for enforcement-based monitoring and show IML detects admission-time drift with provably finite detection delay. Validated across four settings: three drift scenarios (300 and 1000 steps), a live n8n webhook pipeline, and a LangGraph StateGraph agent -- enforcement triggers zero violations while IML detects each drift type within 9-258 steps of drift onset.

Abstract PDF Upgrade to Chat

Authors (1)

Marcelo Fernandez

Summary

The paper demonstrates that local enforcement mechanisms cannot recover admission-time invariants, proving a structural compliance–invariance gap.
The paper introduces the Invariant Measurement Layer (IML) that combines temporal, constraint, and lineage deviation measures to reliably detect hidden drift.
The paper validates IML through theoretical proofs and empirical experiments, confirming its effectiveness in bridging observability gaps in multi-agent systems.

Formal Analysis of Deviation Measurement in Delegated Agent Systems

Introduction

"From Admission to Invariants: Measuring Deviation in Delegated Agent Systems" (2604.17517) rigorously investigates the foundational limits of enforcement-based governance in multi-agent systems and establishes the necessity of explicit deviation measurement mechanisms. Whereas classical enforcement mechanisms (e.g., permission checkers, policy engines, guardrails) operate on individual actions to flag local violations, this paper shows that such local mechanisms are structurally incapable of monitoring global behavioral drift from the admissible behavior space ( $A$ ) established at agent admission time. The authors formalize and prove this impossibility (Non-Identifiability Theorem), then introduce the Invariant Measurement Layer (IML), which overcomes the observability limitations by anchoring monitoring to admission-time invariants. Comprehensive theoretical and empirical analysis demonstrates that only with admission-time snapshots as reference can behavioral drift be reliably detected, even when enforcement consistently registers zero violations.

Background and Formal Problem Setting

Delegated agent systems partition behavior into an admission-time invariant set $A$ —a function of initial constraints, context, and delegation lineage—and rely on an enforcement signal $g$ that operates by flagging hard constraint violations at the action level. The Local Observability Assumption captures the structure of all practical enforcement: action-wise evaluation according to a statically defined rule set, independent of $A$ . The contrast between the trajectory-level nature of $A$ and the pointwise design of $g$ is the crux of the observability gap. This framework generalizes beyond specific enforcement architectures and subsumes standard runtime verification, guardrail, and policy enforcement regimes.

Impossibility Results: The Compliance–Invariance Gap

Existence and Non-Identifiability

The central theoretical results consist of:

Existence of Compliance-Invariance Gap: There always exist traces $\tau$ that are label as compliant by enforcement ( $g(\tau)=0$ ) but lie outside the admission-time behavior set ( $\tau\notin A$ ), provided $A$ is nontrivial (T1).
Non-Identifiability Theorem: No function of $A$ 0 (including all conceivable risk scores and aggregators) can reconstruct $A$ 1-membership; formally, $A$ 2 where $A$ 3 is the minimal $A$ 4-algebra generated by the enforcement signal (T2).

This is supported by both existential and constructive witnesses based on actual agent runs. Empirically, scenarios are demonstrated where post-drift behavior diverges from the admission snapshot, while enforcement remains perpetually silent, confirming the theorem's applicability.

Figure 1: Per-component deviation trajectories illustrating that only $A$ 5 grows during delegation drift, while $A$ 6 and $A$ 7 capture tool/context drift; drift onset at $A$ 8.

Figure 2: Deviation $A$ 9 remains undetected by the enforcement signal in all three drift scenarios, exemplifying the compliance-invariance gap.

Consequences

The information-theoretic corollary (mutual information between $g$ 0-membership and $g$ 1 is strictly less than the entropy of $g$ 2-membership) highlights that even augmenting $g$ 3 with stochastic or continuous scoring offers no recourse—structurally, the local view cannot resolve invariant-level properties. The monotonic hidden drift theorem (T3) guarantees that a sequence of agent traces can diverge arbitrarily from $g$ 4 with $g$ 5 always.

Invariant Measurement Layer: Structure and Guarantees

To bridge the fundamental observability shortcoming, the Invariant Measurement Layer (IML) is defined as a lightweight estimator operating over the agent's full behavior trajectory with direct access to the admission-time distributional snapshot, $g$ 6.

IML Construction

The IML combines three deviation subcomponents:

Temporal Drift ( $g$ 7): JS divergence between the current tool-action distribution and the admission snapshot.
Constraint Proximity ( $g$ 8): Mean risk score based on a pre-specified risk mapping.
Lineage Deviation ( $g$ 9): Normalized deviation in mean delegation depth.

The composite score is maintained via an exponential moving average to provide low-latency yet stable deviation detection. Notably, the snapshot reference is frozen at admission and never adapts, unlike historical anomaly detection methods.

Detection Guarantees

Theoretically, the IML is a consistent estimator for the natural instantiation of the true deviation function $A$ 0 and guarantees finite expected detection delay under a bounded drift regime. For any chosen detection threshold, detection occurs within a window determined by the empirical drift rate, even as enforcement remains blind ( $A$ 1 throughout). The detection guarantee and explicit delay bounds are derived via large deviation inequalities over the sample trajectories.

Figure 3: Comparison of IML and anomaly detector; IML remains robust while the anomaly detector's score decays due to reference contamination after drift onset.

Figure 4: Detection delay $A$ 2 is minimized for delegation drift, reflecting the dominance and rapid growth of $A$ 3; tool/context drift converge at higher thresholds.

Figure 5: Long-horizon drift underlines that $A$ 4 continues to grow monotonically in all drift types while the enforcement signal provides no warning.

Figure 6: In a LangGraph agent, the IML detects deviation and component breakdown is visible, whereas the enforcement signal is zero across the entire drift period.

Empirical Validation

Controlled simulation (300- and 1000-step horizons), a live n8n webhook workflow, and a deterministic agent framework (LangGraph) validate the theoretical claims:

Drifted agent traces accrue monotonic deviation in IML, all while $A$ 5 uniformly.
IML detects all realistic forms of drift (tool usage rebalancing, delegation-depth increase, contextual shift) with finite, predictable delay.
In contrast, an anomaly detector based on rolling historical windows loses sensitivity due to reference contamination, and is incapable of capturing lineage-level drift (B2 fails to register increases in delegation depth).
Real workflow (n8n) and LangGraph stack experiments display prompt, consistent deviation detection, confirming IML's stack-agnostic utility.

Theoretical, Practical, and Architectural Implications

Architectural Minimality

The layered agent governance architecture emerging from this work is not a matter of engineering convenience but a formal necessity. Admission defines invariants ( $A$ 6), enforcement addresses hard, local constraints, while IML covers the strictly necessary region between them—drift within the compliant set is structurally hidden from enforcement but detectable via invariant referencing. This trichotomy is emphasized by composition results in related work, evidencing that omitting IML irrevocably sacrifices invariant-level oversight.

Applications and Limitations

In practice, enforcement-only designs (despite adaptive risk scoring) will allow gradual behavioral drift in deployed agent systems, undermining policy guarantees. This poses particular risks in LLM-based agent architectures, where changes in behavioral context or delegation lineage are common and potentially insidious. The empirical findings underscore the need for explicit admission-time snapshots and invariant measurement as native first-class governance mechanisms. Key limitations involve the expressivity of $A$ 7; richer behavioral contracts may require complex summaries, with associated computation and storage overhead. Adversarial drift, where agents actively minimize $A$ 8, is identified as a scope for future work.

Conclusion

This paper establishes the formal impossibility of recovering admission-time behavioral invariants via enforcement-based governance alone and introduces IML as a provably necessary and empirically validated complement. Theoretical analysis, constructive witnesses, and real-agent experiments substantiate the structural separation between compliance and invariance. Effective drift monitoring in multi-agent systems therefore mandates direct reference to admission-time agent snapshots; IML is a scalable, compositional instantiation of this requirement. Systems relying solely on enforcement are, by design, incapable of detecting long-horizon behavioral drift. The results have immediate implications for the governance of deployed agent systems and set the foundation for further advances in compositional, multi-layered agent oversight.

Markdown Report Issue