AI Alignment Contracts: Specification & Enforcement
- Alignment contracts are explicit specifications that define acceptable computational behavior, detailing inputs, outputs, and obligations to guide AI systems.
- They employ formal models like neurosymbolic frameworks and finite-trace contracts to ensure semantic correctness and enforce runtime mediation.
- These contracts integrate technical, legal, and economic perspectives, addressing incomplete contracting and institutional governance in AI alignment.
Alignment contracts are explicit specifications that bind computational behavior to acceptable inputs, outputs, actions, or obligations. In the recent literature, the term spans several related but non-identical constructions: reward- or objective-level contracts in principal–agent accounts of AI alignment (Hadfield-Menell et al., 2018), alignment as a contract-design problem situated in social, economic, and legal institutions (Stańczak et al., 27 Feb 2025), “Requirements Contracts” that give propositions the role of requirements through enacted rights and obligations (Jureta, 2021), DbC-inspired neurosymbolic contracts that mediate every LLM call through types and semantic predicates (Leoveanu-Condrei, 5 Aug 2025), finite-trace contracts over observable effects in agentic security systems (David et al., 30 Apr 2026), and conformance checks between natural-language e-contracts and derived smart contracts via knowledge-graph comparison (Godboley et al., 27 Apr 2025). Across these lines, the common theme is that alignment is treated not as an implicit property of prompts or weights, but as a specification-and-enforcement problem.
1. Contractual conceptions of alignment
A central origin of the idea is the analogy between AI alignment and incomplete contracting. In that view, a complete contingent contract would specify, for every relevant state of the world and every relevant action, what the agent must do and what it receives, but real contracts are incomplete because some states are non-contractible, parties are boundedly rational, drafting and enforcement are costly, and many contingencies are left to renegotiation or external completion. The parallel claim for AI is that reward functions are necessarily incomplete and misspecified: a reward cannot fully capture the richer human welfare function , so misalignment is structural rather than merely accidental (Hadfield-Menell et al., 2018).
The 2025 societal-alignment literature generalizes this point by embedding LLM alignment inside broader “societal alignment frameworks.” It distinguishes social alignment, economic alignment, and contractual alignment, and treats current alignment methods such as RLHF and constitutions as incomplete contracts between principal and agent. In that framing, a contract is a pair , where is an action by the agent and is a reward function over prompt–response pairs. The principal may be a user, developer, or deploying firm, while the agent is the model. Because the output space and relevant contexts are too large to specify exhaustively, reward hacking, fake alignment, and context-sensitive failures arise as contract-theoretic consequences of incompleteness (Stańczak et al., 27 Feb 2025).
Requirements engineering introduces a different but closely related formulation. A proposition counts as a requirement if and only if a Requirements Contract exists, is enacted, and is exercised so that the proposition is requested under the contract. The contract minimally defines the right to give propositions the role of requirements, the obligation to satisfy requirements, the obligation to validate satisfaction, the obligation to remunerate satisfaction and validation, and the corresponding rights to request remuneration. Under this account, requirements are institutional roles sustained by contractual, economic, and engineering relations rather than merely optative sentences or speech acts (Jureta, 2021).
These strands converge on a shared shift in perspective. Alignment contracts are not only statements of desired behavior; they are mechanisms for assigning authority, specifying acceptable conduct, defining evidence of compliance, and allocating the consequences of failure. This suggests that the literature uses “contract” both descriptively, to model principal–agent relationships, and operationally, to engineer interfaces, monitors, and validation layers.
2. Formal models and semantic domains
The most explicit LLM-oriented formalization appears in the DbC-inspired neurosymbolic framework, where a contract surrounds every LLM-mediated component and is written in Hoare-style form as . Preconditions apply to inputs, postconditions to outputs, and both are defined over well-typed data structures in a type system 0 instantiated by Pydantic-based LLMDataModel subclasses. The type-theoretic interpretation is Curry–Howard-based: a type corresponds to a proposition, a well-typed value to a constructive proof, and contract satisfaction is probabilistic because the underlying component is stochastic. The intended judgment for an agent function 1 is therefore 2, with 3 measured empirically over a sliding window of invocations (Leoveanu-Condrei, 5 Aug 2025).
The agentic-security formulation moves from typed semantic objects to observable effect traces. There, an alignment contract is a tuple
4
where 5 is a scope predicate over targets, 6 and 7 are allowed and forbidden effect-descriptor predicates, 8 is a resource budget, 9 is a disclosure policy, 0 is the tracked resource set, 1 gives per-event resource costs, and 2 extracts modeled disclosures from events. Satisfaction is defined over finite traces: an event is admissible relative to a prior prefix if it is in scope, allowed, not forbidden, within budget, and disclosure-compliant; a trace satisfies the contract if every position is admissible. Because violations admit finite bad-prefix witnesses, the property is a safety property in the finite-trace sense (David et al., 30 Apr 2026).
A third formalization treats alignment as cross-representation conformance. The e-contract is represented as a knowledge graph 3, and the smart contract as 4. Matching functions over entities and relations produce 5 and 6, while discrepancies are
7
In this framework, “fulfilling all conditions” means graph-level conformance of entities and relations extracted from the legal text and the executable code (Godboley et al., 27 Apr 2025).
At the requirements-engineering level, the formal object is neither a reward function nor a trace contract nor a graph, but a network of roles, expectations, actions, rights, and obligations. The key formal move is that a proposition becomes a requirement only inside an enacted contract structure containing RtR, OtR, OtV, OtRS, OtRV, RtRS, and RtRV. The semantic domain is therefore institutional and economic as much as logical (Jureta, 2021).
3. Runtime mediation and enforcement
In the neurosymbolic LLM setting, the contract layer is an execution wrapper around each forward method. Inputs are type-checked against 8, preconditions are evaluated, an optional act method may transform data, the LLM generates a structured output guided by prompts and type descriptions, postconditions are validated, and remediation may be invoked if validation fails. Remediation is implemented through a ValidationFunction component that iteratively refines outputs by incorporating validation error messages into corrective prompts. Error history is accumulated across retries to avoid cycles, and the system always executes the underlying forward in a finally block so that contract failure degrades from verified to best-effort behavior rather than halting operation. Success at each invocation is treated as a Bernoulli random variable, with empirical
9
computed over a sliding window (Leoveanu-Condrei, 5 Aug 2025).
In the effect-trace framework, enforcement is performed by a reference monitor. Given a proposed event and the realized prefix so far, the monitor allows the event if and only if the admissibility predicate holds; otherwise it denies the event. Under the Effect Observability Assumption, every relevant effect in the chosen profile is mediated before execution and blocked events cannot bypass the monitor. The enforcement soundness theorem then states that for any finite proposed trace 0, the realized trace 1 satisfies the contract. The guarantee is intentionally behavioral: it quantifies over the agent model and depends only on mediation completeness at the observable boundary, not on the model’s internal reasoning, intent, or prompt state (David et al., 30 Apr 2026).
The contrast between these two enforcement regimes is consequential. The DbC-inspired layer validates semantic properties of structured values and uses the model itself in a generate–validate–repair loop, whereas the effect-trace monitor blocks inadmissible external actions at a mediated boundary. This suggests a division between contracts that regulate semantic correctness of outputs and contracts that regulate world-facing capability. The former accept stochasticity and quantify compliance probabilistically; the latter define safety properties over realized effects.
4. Institutions, authority, and incomplete specification
The societal-alignment literature argues that technical contracts are only one part of alignment. Human societies cope with incomplete contracts through norms, laws, dispute resolution, welfare principles, and governance mechanisms, and LLM alignment should likewise rely on supporting institutional infrastructure rather than only on better reward modeling. Social alignment contributes norms, values, and normative competence; economic alignment contributes welfare functions, Pareto efficiency, and pluralistic aggregation; contractual alignment contributes incomplete contracting, law, regulation, and internal and external governance mechanisms such as constitutional AI and debate. This line of work also distinguishes unwanted epistemic uncertainty from essential normative and contextual uncertainty, and proposes participatory alignment interface designs that expose uncertainty, permit feedback, and support ongoing revision of alignment objectives (Stańczak et al., 27 Feb 2025).
The incomplete-contracting account from economics and law sharpens this institutional thesis. Human contracting works because written terms are supplemented by courts, culture, social norms, and other sources of implied terms. The familiar “vase and boxes” example makes the point: a human worker paid per box will typically avoid breaking an unforeseen vase even if the written contract is silent, because the effective contract includes legal, reputational, and moral sanctions. The conjecture for AI is that alignment requires systems able to connect local task specifications to broader normative environments and to internalize predicted sanctions for wrongful actions in context (Hadfield-Menell et al., 2018).
Requirements Contracts make authority and incentives explicit. They distinguish the Requester, Maker, and Evaluator, each with expectations 2, 3, and 4, and model contract participation through expected benefits and costs: 5 The Requester’s expected cost must cover the Maker’s and Evaluator’s expected benefits, expressed as 6. Because each party is assumed to make decisions that maximize its own expected value, misalignment appears when marginal incentives diverge, for example when the Maker and Evaluator benefit from increased costs while the Requester does not. In this setting, an alignment contract is inseparable from role allocation, remuneration, and validation authority (Jureta, 2021).
Taken together, these works present alignment as an institutional design problem. Specifications are incomplete; authority to define them is contested; and compliance depends not only on formal semantics but also on monitoring, interpretation, remuneration, and revision mechanisms.
5. Modularity, equivalence, and conformance
A notable property of contract-based design is modular substitutability. In the neurosymbolic LLM framework, agents satisfying the same probabilistic contracts are said to be “functionally equivalent with respect to those contracts.” If two agents share the same type system 7 and the same contract set 8, then contexts that observe behavior only through 9 cannot distinguish them except through success probability, operational cost, and expressivity. The same framework treats multi-step systems compositionally: each Expression has its own contract, intermediate values are checked for type validity and semantic validity before being passed forward, and violations are localized to the step where they occur (Leoveanu-Condrei, 5 Aug 2025).
The agentic-security calculus provides a more explicit algebra for modular engineering. Refinement makes a contract stricter by narrowing scope, reducing allowed effects, increasing forbidden effects, tightening budgets, and strengthening disclosure policy; refinement soundness states that if 0 and a trace satisfies 1, then it also satisfies 2. Composition forms a combined contract by intersecting scopes and allowed sets, unioning forbidden sets, minimizing budgets, and intersecting disclosure policies, subject to compatibility of resource sets, cost functions, and flow extractors. The resulting soundness theorem is one-way: if a trace satisfies the composed contract, it satisfies each constituent contract (David et al., 30 Apr 2026).
Knowledge-graph validation offers another notion of compositionality, this time across artifacts rather than agent steps. The e-contract functions as a specification, the smart contract as an implementation, and both are projected into a shared abstract space before comparison. Missing nodes or edges in 3 identify e-contract conditions not implemented in code, while unmatched structures in 4 identify extra behavior not authorized by the agreement. In the rental-agreement case study, core entities such as landlord, tenant, rent, security deposit, and term dates align, while clauses such as one-month written notice or utilities responsibility may remain unmatched on the smart-contract side (Godboley et al., 27 Apr 2025).
These constructions support a broader interpretation of alignment contracts as interface specifications. A contract can delimit when one LLM-backed component may replace another, how multiple policy layers combine, or whether executable code preserves the semantics of a higher-level agreement. This suggests that alignment, in contract-centric systems, is often less about introspecting internals than about preserving observable and validated relations across modules, traces, and representations.
6. Limits, disputes, and open research directions
Contract-based alignment does not eliminate the limits of present models. In the DbC-inspired LLM framework, semantic validation remains bounded by LLM capability and stochasticity; low temperature may prune valid solutions, high temperature increases variance, some contracts may be unachievable for current models, and poorly specified contracts may either over-constrain agents into brittleness or remain too permissive to provide meaningful guarantees. The implementation relies on runtime checks rather than a fully verified pipeline, although future work is explicitly directed toward Lean4 formalization of type safety and contract-satisfaction properties (Leoveanu-Condrei, 5 Aug 2025).
The effect-trace framework states its own boundary conditions with unusual precision. Admissibility checking is decidable when the relevant predicates are decidable and the accounting functions computable, but static certification that a dynamically acquired tool never attempts a forbidden effect is generally undecidable via reduction. The observability-boundary theorem further shows that enforceability depends on what the monitor can observe: payload steganography, timing channels, intent-level properties, and liveness properties fall outside the guarantee when they are not represented in the observation model. The formal core theorems are mechanized in a Lean 4 artifact, but the central soundness result still depends on the Effect Observability Assumption and the integrity of the mediation boundary (David et al., 30 Apr 2026).
The broader contractual metaphor is itself contested. One caution is that treating AI alignment fundamentally as incomplete contracting can obscure socio-political dynamics and power structures, including the question of who sets objectives and whose norms prevail. Another is that societal frameworks imported from economics or law may reproduce their own idealizations, including Western-centric assumptions, simplified rationality models, and difficulties handling cross-cultural norm conflict. The same literature therefore couples its contract-theoretic framing with calls for pluralistic alignment, contestability, uncertainty communication, and participatory interface design rather than a once-and-for-all specification of the correct reward or constitution (Stańczak et al., 27 Feb 2025, Hadfield-Menell et al., 2018).
Cross-representation validation work exposes a different set of limits. Knowledge-graph comparison is a semantic-diff mechanism, not a theorem-proving framework: it does not provide soundness or completeness theorems relating 5 to full legal-code equivalence, does not model deontic or temporal logic, and does not capture low-level smart-contract behaviors such as reentrancy or gas semantics. The authors accordingly propose future work on converting e-contracts into AST-like structures and comparing ASTs directly, which would deepen the structural relation between legal clauses and executable logic (Godboley et al., 27 Apr 2025).
The resulting research agenda is therefore dual. One direction seeks stronger formalization: mechanized proofs, richer contract calculi, and more precise mediation models. The other seeks broader legitimacy and coverage: participatory alignment tooling, pluralistic and welfare-centric reward design, internal governance architectures, and methods for handling incomplete, evolving, and contested norms. Alignment contracts, in this combined sense, define a program in which alignment is specified, monitored, revised, and institutionally situated rather than assumed.