- The paper introduces AI Integrity as a process-based paradigm that shifts focus from output compliance to verifiable, audit-ready reasoning.
- It presents a four-layer Authority Stack model (Normative, Epistemic, Source, Data) and key metrics like Cascade Consistency Index for internal evaluation.
- The PRISM framework operationalizes auditability, offering a roadmap to integrate empirical measurement into high-stakes AI governance.
AI Integrity: A Process-Based Paradigm for Verifiable AI Governance
Introduction
The paper "AI Integrity: A New Paradigm for Verifiable AI Governance" (2604.11065) introduces and formalizes the concept of AI Integrity, arguing that prevailing paradigms in AI governance—Ethics, Safety, and Alignment—fail to address the verifiability of the reasoning process in AI systems. The proposed paradigm shifts AI governance from outcome-based evaluation to rigorous process verification, offering a procedural standard that insists on auditability of the reasoning path regardless of the specific value hierarchies held by the system. The work is situated in the context of growing reliance on AI for high-stakes decisions and responds directly to the increasing opacity observed in value, epistemic, and data handling within LLMs and other generative models.
Limitations of Existing AI Governance Paradigms
Ethics, Safety, and Alignment represent the current triad of AI governance. Each is fundamentally output-oriented:
- AI Ethics functions prescriptively, judging whether outputs adhere to pre-specified moral standards but ignoring the opacity of the operational value hierarchy.
- AI Safety centers on preventing externally induced failures but does not confront endogenous bias or epistemic incoherence in reasoning.
- AI Alignment assumes that alignment with human (or institutional) preferences ensures trustworthiness, but elides transparency regarding how such preferences are operationalized or what mechanisms underlie aligned outputs.
These paradigms make two problematic assumptions: that desired output properties are specifiable a priori, and that output compliance suffices to guarantee reliable internal reasoning. The absence of process-level verification allows for systems that are functionally aligned or "ethical" yet are structurally incoherent, susceptible to internal manipulation, or indeterminate in reasoning.
Definition and Structure of AI Integrity
AI Integrity is formally defined as:
A state in which the Authority Stack of an AI system—its layered hierarchy of values, epistemological standards, source preferences, and data selection criteria—is protected from corruption, contamination, manipulation, and bias, and maintained in a verifiable manner.
Three properties are central:
- Process-orientation: Integrity does not prescribe which values or standards are correct, but requires stable, audit-ready reasoning structures.
- Multi-layered evaluation: Verification encompasses four interdependent layers (Normative, Epistemic, Source, Data) rather than focusing singularly on output or stated principles.
- Verifiability: Integrity is not a matter of declared transparency but is empirically grounded in measured behavioral consistency across systematically varied scenarios.
Three axes operationalize the concept: value consistency (across contexts), judgmental accountability (external auditability of the authority stack), and agency protection (preserving the distinction and interaction between AI and human agency).
The Authority Stack: Four-Layer Model
The Authority Stack represents the backbone of AI Integrity, separating concerns into:
- L4: Normative Authority—The explicit value hierarchy, conceptualized via Schwartz Basic Human Values, providing a globally validated structure for value ordering.
- L3: Epistemic Authority—The evidence standards operationalized, using Walton argumentation schemes and GRADE/CEBM hierarchies, defining the logical acceptability and priority of types of evidence.
- L2: Source Authority—Source credibility judgments, modeled using Source Credibility Theory (competence, trustworthiness, goodwill), which mediate the weighting of institutions and stakeholders.
- L1: Data Authority—Patterns of data selection, considered a function of higher-tier profiles.
This model is an analytical structure for measurement and does not presume a specific causal ordering within model internals. The cascade, however, is empirically testable through metrics such as the Cascade Consistency Index (CCI).
Threats to AI Integrity
Two principal threats are diagnosed:
- Authority Pollution—Illegitimate, opaque, or unpredictable intra-stack influences, such as value-based data suppression or non-traceable source selection. Documented examples include the contamination of data selection via normative directives, as seen in generative image models producing historically inaccurate outputs in response to post hoc value constraints.
- Integrity Hallucination—Absence of a coherent value or reasoning structure, manifest in inconsistency of responses to structurally identical inputs. The paper decomposes hallucination into stochastic noise, framing sensitivity, and structural incoherence; only the last represents a critical governance failure.
The PRISM Framework: Measurement and Auditing Methodology
AI Integrity requires a standard for process audit, realized by the PRISM (Profile-based Reasoning Integrity Stack Measurement) framework. The framework is grounded in the Enhanced Cascade Mapping Hypothesis: Profiles for Normative, Epistemic, and Source Authority (L4–L2) are measured independently, and data selection patterns (L1) are predictable as a function of the upper-layer profiles.
Core Metrics
Six metrics are introduced:
- Value Entropy (VE): Dispersion of value judgments.
- Scenario Replication Score (SRS): Stability across varied scenarios.
- Test-Retest Reliability (TRR): Intra-scenario output consistency.
- Cascade Consistency Index (CCI): Degree of empirical concordance between predicted and measured lower layers.
- Authority Stack Predictive Accuracy (ASPA): Predictive utility of the stack profile for free-form responses.
- Perspective Consistency Score (PCS): Robustness of value/evidence judgments across divergent agentive perspectives.
These metrics allow for both intra-model diagnostics (identifying sources of inconsistency) and cross-model comparisons.
Research Roadmap
A phased program is specified: initial forced-choice benchmarking at the value, evidence, and source levels (Q2–Q4 2026), multi-domain cross-validation, eventual derivation and validation of free-form data selection predictions (2027), and prospective integration with neuro-symbolic governance pipelines (e.g., GRACE).
Limitations and Open Questions
The procedural neutrality inherent to AI Integrity means that it cannot, by itself, guarantee normatively "good" behavior; a machine with a perverse but stable value hierarchy is still considered "integritous." The Authority Stack model is an analytic, not architectural, abstraction, and its application to model internals is subject to validation through empirical measurement. Distinguishing legitimate contextual flexibility from Authority Pollution remains a theoretical challenge, as the line between complexity and inconsistency is non-trivial.
Open technical questions include the independence of hallucination phenomena across layers, the existence of entropy correlations among layers, thresholds for behaviorally predictive accuracy, stability of stack profiles under continual deployment or updates, and the intersection with mechanistic interpretability.
Practical and Theoretical Implications
This paradigm has clear implications for AI deployment in regulated domains, as it enables auditable, stakeholder-specific profiles of AI reasoning—a necessary precondition for trust in high-stakes applications. Integrating Integrity-based measurement with policy and regulatory standards could facilitate both ex ante certification and ongoing audit, superseding mere output-focused compliance. Theoretically, it offers a falsifiable approach to testing stack-wise consistency hypotheses in black-box or semi-transparent models, and provides a framework for empirical study of process-level robustness.
Future development may include mechanism-based interpretability work that maps stack consistency to architectural parameters and training dynamics, and the establishment of standardized benchmarks within public evaluation suites.
Conclusion
The AI Integrity paradigm systematically addresses the structural gap in current AI governance frameworks by centering the verifiability of the reasoning process via the Authority Stack. Distinguishing between consistent and polluted cascades, and between process-level coherence and hallucination, the framework pivots governance toward empirical, auditable, process-based evaluation. Operationalized through the PRISM methodology, this paradigm provides a pathway for future work in both AI audit and theoretical model analysis, and supplies actionable tools for operational governance, especially as AI systems permeate domains where transparency and trustworthy process cannot be separated from acceptable outcomes.