Alignment Contracts for Agentic Security Systems

Published 30 Apr 2026 in cs.CR and cs.LO | (2605.00081v1)

Abstract: Agentic security systems increasingly combine LLM planners with tools that can discover, validate, and report vulnerabilities. This creates an asymmetric control problem: the system should retain strong offensive capability inside an authorized engagement, while the same capabilities must be denied outside scope. Existing guardrails provide useful policy controls, but they do not make this boundary a first-class formal contract over observable effects. We introduce alignment contracts, a framework for specifying and enforcing behavioral constraints over observable effect traces. A contract defines scope, allowed and forbidden effects, resource budgets, and disclosure policies. We give the language finite-trace semantics, characterize satisfaction as a safety property with finite violation witnesses, develop refinement and one-way composition rules for modular contract engineering, and show that admissibility checking is decidable. We instantiate the framework for web-focused agentic security workflows and show how the same structure extends to other effect profiles. Under an explicit Effect Observability Assumption, where all $\SigmaEff$-effects are mediated, the soundness theorem quantifies over the agent model and gives guarantees for mediated $\SigmaEff$-effects, including enforcement soundness for monitor-realized traces. We also state an assumption-lifted adaptation result and formalize limits through undecidability transfer and observability-boundary theorems. A Lean 4 artifact checks the formal core theorems used by the paper.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces alignment contracts that formally specify and enforce effect-level security policies for offensive LLM-based agents.
It employs finite-trace semantics and modular composition to guarantee operator-defined constraints with computable enforcement costs.
The framework is mechanized in Lean 4, delineating clear boundaries and limitations to ensure auditable and controlled security deployments.

Formal Foundations and Guarantees of Alignment Contracts for Agentic Security Systems

Introduction and Motivation

"Alignment Contracts for Agentic Security Systems" (2605.00081) establishes a rigorous framework for specifying and enforcing behavioral constraints on LLM-based agentic security systems, addressing the asymmetric control needs inherent in offensive security workflows. Agentic systems combining LLM planners with tool invocation possess capabilities for vulnerability discovery, exploitation, and reporting, producing security-relevant effects that require tightly-scoped authorization. The fundamental problem addressed is enforcing boundaries over observable effects, ensuring that offensive capabilities are retained only within operator-sanctioned engagements while preventing unauthorized actions and disclosures, particularly in adversarial environments with indirect prompt injection and tool interface compromise.

Alignment Contracts: Formal Specification and Enforcement

The paper introduces alignment contracts as tuples $C = \langle S, E_{\mathsf{allow}, E_{\mathsf{forbid}, B, D, \mathsf{Res}, \mathsf{cost}, \mathsf{flows}\rangle$, which formalize scope predicates, effect-level allow/forbid rules, resource budgets, disclosure policies, and modular accounting functions. The language adopts finite-trace semantics:

Integrity: Realized effects must remain within operator-specified scope and constraints.
Modeled Disclosure: Sensitive outputs are restricted to authorized sinks per declared flows.

The monitor admits effects if they satisfy these contract clauses relative to the history of mediated events, abstracting away from internal agent intent and focusing exclusively on overt effect traces.

A critical premise is the Effect Observability Assumption (EOA): all externalized agent actions (network requests, file accesses, process executions) are correctly mapped, intercepted, and mediated, forming the basis for soundness guarantees.

Enforcement Soundness and Security Property Characterization

The enforcement barrier, typically instantiated as a reference monitor, operates deterministically on intercepted events, enforcing contract admissibility solely on observable parameters and trace history. Theorem-level soundness is established: for any sequence of agent-proposed actions, the monitor-realized trace satisfies the contract, irrespective of agent internal policy, adversarial nature, or prompt injection (Corollary: agent-behavior independence).

The guarantee is deliberately scoped—soundness holds for overt, mediated effects but does not cover payload-based covert channels, steganography, timing channels, or intent inference. The framework explicitly formalizes the limits through undecidability transfer for forbidden-effect absence and observability-boundary theorems.

Alignment contracts are amenable to algebraic operations:

Refinement ( $C' \sqsubseteq C$ ): Contract $C'$ imposes stricter constraints than $C$ , ensuring that satisfaction of $C'$ implies satisfaction of $C$ (refinement soundness).
Compatible Composition ( $\mathsf{compose}(C_1, C_2)$ ): Supports modular aggregation of enforcement clauses, with one-way implication soundness proved for compatible contracts.
Decidability: The admissibility predicate is computable with bounded complexity, enabling practical runtime enforcement with $O(|\mathsf{Res}|)$ per-event cost given cached counters.

The formal core is mechanized in Lean 4, allowing machine-checkable proofs and correspondence manifests to guarantee theorem-level claims.

Assumption Schemas and Adaptation in Dynamic Architectures

The framework abstracts contract satisfaction in dynamic agent architectures via an extensibility schema:

Safe Architecture State: Formal predicate $\mathsf{Safe}(G)$ for admissible deployments.
Adaptation Rules: Sequences of contract-preserving adaptation rules ensure continued contract satisfaction across state evolutions, conditional on explicit mediation and trace bridge assumptions.

A concrete model is instantiated, clarifying obligations for mediation, event extraction, and policy application in practical deployments.

Model Limits, Threat Characterization, and Impossibility Results

The paper is explicit about its boundaries:

Undecidability Transfer: Static certification of forbidden-effect absence is undecidable for Turing-complete tool languages under supplied reduction conditions.
Mediation Bypass: Tools that evade mediation invalidate soundness premises; enforcement must be confined via sandboxing and capability gating.
Observation Boundary Schema: Only $\mathsf{obs}$ -observable behaviors with finite bad-prefix properties are enforceable; payload content, timing channels, semantic intent, and liveness remain outside proof scope.

These results provide precise boundaries on the enforceability of contracts and reinforce the need for defense-in-depth against semantic and covert threats.

The paper situates its contribution within literature on agentic security systems [deng2024pentestgpt, shen2025pentestagent, gervais2025a1], agentic benchmarks [liu2024agentbench, debenedetti2024agentdojo], information-flow enforcement [balunovic2025fides], runtime policy DSLs [agentspec2026], and classic monitor-enforcement traditions [ligatti2005edit, schneider2000enforceable]. The innovation of this framework is the explicit effect-level contract semantics, formal safety, modular contract engineering, and auditable assumption schemas, in contrast to intent-centric or heuristic approaches.

Implications and Prospects for Agentic Security Systems

Practically, alignment contracts enable auditable, modular, and formally-verified enforcement for agentic security deployments with offensive capability. They delineate a minimal trusted computing base (the reference monitor, mediation boundary, and contract specification), shifting trust away from opaque learned components. The separation of effect-level enforcement from semantic mitigation allows clear auditing of residual risks and operational limits.

Speculation for Future Developments: The framework provides a basis for quantifying retained offensive utility under strict contracts, mechanizing controlled contract updates, and integrating best-effort semantic defenses within layered architectures. Future work should instrument contract authoring, review, and update workflows, and empirically evaluate the intersection of formal guarantees and operational security in real deployments.

Conclusion

"Alignment Contracts for Agentic Security Systems" presents a formalized, decidable, modular enforcement framework for constraining agentic systems at the effect boundary under explicit mediation assumptions. It mechanizes key security properties, architectures, and negative results, rendering guarantees auditable and assumption-dependent. The approach enables robust enforcement for offensive-capable agents while clarifying domains of theoretical and practical limits, advocating explicit contracts, trusted mediation, and operator oversight as prerequisites for secure deployment of agentic systems.

Markdown Report Issue