AI Governance under Political Turnover: The Alignment Surface of Compliance Design

Published 22 Apr 2026 in cs.AI and econ.GN | (2604.21103v1)

Abstract: Governments are increasingly interested in using AI to make administrative decisions cheaper, more scalable, and more consistent. But for probabilistic AI to be incorporated into public administration it must be embedded in a compliance layer that makes decisions reviewable, repeatable, and legally defensible. That layer can improve oversight by making departures from law easier to detect. But it can also create a stable approval boundary that political successors learn to navigate while preserving the appearance of lawful administration. We develop a formal model in which institutions choose the scale of automation, the degree of codification, and safeguards on iterative use. The model shows when these systems become vulnerable to strategic use from within government, why reforms that initially improve oversight can later increase that vulnerability, and why expansions in AI use may be difficult to unwind. Making AI usable can thus make procedures easier for future governments to learn and exploit.

Abstract PDF Upgrade to Chat

Authors (1)

Andrew J. Peterson

Summary

The paper introduces a formal three-stage model that reveals a sharp alignment threshold where compliance design becomes vulnerable to exploitation after political turnover.
The paper identifies a codification dilemma where increased standardization initially reduces overt abuse but ultimately stabilizes the alignment surface, increasing risk of within-form erosion.
The paper shows that crisis-driven expansions lead to irreversible administrative changes with enduring empirical signatures that challenge traditional oversight measures.

AI Governance under Political Turnover: The Alignment Surface of Compliance Design

Institutional Problem Statement

"AI Governance under Political Turnover: The Alignment Surface of Compliance Design" (2604.21103) addresses the interaction between procedural oversight architectures in democratic governments and the adoption of AI-mediated administrative systems. The central problem examined is that reforms enhancing auditability, traceability, and codification to make AI usable in administration also increase “learnability”—the ability for successors, especially following political turnover, to probe and strategically exploit inherited compliance surfaces. Crucially, the paper departs from standard accounts in public administration and AI governance by emphasizing how the very instruments that make AI governable and reviewable also constitute a stable "alignment surface," a procedural boundary that can be adaptively exploited after turnover.

Formal Model and Theoretical Contributions

The author presents a formal three-stage model that is generalizable to democratic institutional settings deploying probabilistic AI for administrative tasks:

Stage 0 (T=0): Upstream safeguard design (statute, organizational constraints, remedy speed).
Stage 1 (T=1): AI adoption, with institutional choice of automation scale, codification degree, and the embedding of safeguards.
Stage 2 (T=2): Political turnover; governance is inherited by either democratic or autocratic successors with exogenous probability. The new principal seeks either to comply or to erode democratic procedures (overtly or within-form).

The space of administrative actions $\mathcal{A}$ is partitioned into permissible, impermissible, and ambiguous sets. The compliance layer operationalizes screening through a Pass/Flag gate, which induces the "alignment surface." The model features two abuse channels:

Overt Abuse: Success probability $F_0$ is mitigated by codification and oversight safeguards.
Within-Form Erosion: Success probability $p_{\mathrm{wf}}$ increases with scale, codification (stability and repeatability of review), and is countered by contestation and access constraints.

The intensity of within-form risk is summarized by $\mu(x,s,r) = \mu_0(r) + \eta(r) x s$ , where $x$ is scale (share of decisions), $s$ is codification, and $r$ parameterizes safeguards. The probability of at least one effective subversive move (passing but erosive) is $1 - \exp(-\mu(x,s,r))$ .

Main Results

Alignment Surface Threshold: There is a sharply defined threshold in $(x, s, r)$ -space above which the administrative stack becomes systematically vulnerable to within-form exploitation by insiders. This threshold delineates systemic robustness from exploitability.
Codification Dilemma: While initial codification reduces overt abuse, increases in codification (when auditability and standardization are bundled) eventually switch from being protective to risk-increasing, as they stabilise the alignment surface and make the abuse surface more legible.
Irreversibility and Path Dependency: Modernization pressure drives increases in scale and codification that can cross the vulnerability threshold. Post-crisis, rolling back only partially reduces the additional risk; the structure is persistently altered because installed procedures are selectively repaired, not comprehensively unwound.

Robustness and Analytical Depth

The model distinguishes oversight-facing auditability from insider-facing standardization, showing their divergent impacts on risk: only increases in the latter consistently make exploitation easier. The author supplies robustness to these claims through alternative aggregation mechanisms (strategic channel choice vs. additive risk), alternative microfoundations for ambiguity reduction, and extends results to nonlinear intensity indices.

Empirical and Practical Implications

Strong empirical implications emerge from this framework:

Post-Turnover Patterns: The likelihood of procedurally compliant yet substantively abusive administrative actions increases with scale and codification of AI-mediated compliance systems, especially after turnover.
Review-Deployment Coupling: Risk is highest where compliance feedback is easily iterated against and not institutionally decoupled from deployment; it is mitigated where AI systems are advisory, access-controlled, or audit and deployment responsibilities are separated.
Persistence of Vulnerability: Expansions driven by crisis or modernization are not fully reversible; partial selective repair is the norm, especially in high-scale, high-codification environments.
Observational Signatures: Increased routine scope, proliferation of reusable compliance artifacts, and traces of systematic probing (internal iterative boundary-testing) are distinctive signatures of increased within-form vulnerability.

These empirical signatures refute strong-form claims that more oversight is always protective, and they challenge frameworks treating automation as inherently illegitimate or unidirectionally dangerous.

Theoretical and Policy Discussion

The model's core finding is that the coupling of auditability and standardization, while enhancing transparency and oversights, forms a double-edged institutional adaptation. Beyond classical principal-agent formulations, these findings connect to AI safety concepts such as specification gaming and reward hacking but highlight a distinctive institutional vulnerability: procedural regularity can enable insiders to iteratively learn and exploit inherited review boundaries, especially under conditions of political adversity.

Standard technical solutions—logging, transparency, increased contestation—may become insufficient or even counterproductive if they also make approval processes more stable and transferable across domains and over time. Instead, design priorities should include controlling access to iterative feedback, decoupling evaluation from deployment, and investing in fast, reliable remedy mechanisms that limit the window for within-form abuse. Reforms should be evaluated on their capacity to reduce the alignment surface's legibility to potential adversarial insiders, not merely on improvements in auditability to external overseers.

Future Directions

The analysis points to future research on endogenous safeguard evolution, the co-evolution of within-form and overt abuse channels, and the long-run dynamics of AI-mediated administrative architectures under repeated political turnover. Extending the model to account for strategic anticipation by both designers and potential abusers, as well as empirical studies of installed administrative stacks, will clarify further the limits and possibilities for robust, democratic AI governance.

Conclusion

The argument establishes that AI governance, especially under democratic turnover, must recognize the institutional risks created by making probabilistic AI systems usable and reviewable. The same mechanisms that make AI outputs defendable and traceable can also leave behind stable, reusable approval boundaries exploitable by successors. The main policy implication is that auditability is not per se sufficient; the learnability of the compliance surface by future insiders must be actively constrained. Reform strategies must balance oversight visibility with the minimization of procedural legibility to adversarial actors, addressing the full life cycle and inheritance structure of AI-enabled administrative systems.

Markdown Report Issue