Papers
Topics
Authors
Recent
Search
2000 character limit reached

Governed Reasoning for Institutional AI

Published 12 Apr 2026 in cs.AI, cs.CY, and cs.MA | (2604.10658v1)

Abstract: Institutional decisions -- regulatory compliance, clinical triage, prior authorization appeal -- require a different AI architecture than general-purpose agents provide. Agent frameworks infer authority conversationally, reconstruct accountability from logs, and produce silent errors: incorrect determinations that execute without any human review signal. We propose Cognitive Core: a governed decision substrate built from nine typed cognitive primitives (retrieve, classify, investigate, verify, challenge, reflect, deliberate, govern, generate), a four-tier governance model where human review is a condition of execution rather than a post-hoc check, a tamper-evident SHA-256 hash-chain audit ledger endogenous to computation, and a demand-driven delegation architecture supporting both declared and autonomously reasoned epistemic sequences. We benchmark three systems on an 11-case balanced prior authorization appeal evaluation set. Cognitive Core achieves 91% accuracy against 55% (ReAct) and 45% (Plan-and-Solve). The governance result is more significant: CC produced zero silent errors while both baselines produced 5-6. We introduce governability -- how reliably a system knows when it should not act autonomously -- as a primary evaluation axis for institutional AI alongside accuracy. The baselines are implemented as prompts, representing the realistic deployment alternative to a governed framework. A configuration-driven domain model means deploying a new institutional decision domain requires YAML configuration, not engineering capacity.

Authors (1)

Summary

  • The paper introduces a novel AI architecture embedding governed reasoning that enforces transparent auditability and structural oversight in institutional decision-making.
  • It employs nine atomic cognitive primitives and a four-tier governance model to ensure decision traceability and prevent silent errors.
  • Empirical results show significantly improved accuracy and error mitigation compared to traditional agent-based and prompt-engineered models.

Governed Reasoning for Institutional AI: Architectural Foundations and Empirical Analysis

Problem Context and Motivation

This work systematically argues for a paradigmatic shift in the architecture of AI deployed for high-consequence institutional decisions. Its central claim is that current agent-based and prompt-engineered architectures (as exemplified by ReAct (Shinn et al., 2023) and Plan-and-Solve (Wang et al., 2023)) are fundamentally misaligned with the requirements of institutional trust, auditability, and persistent governance. Drawing on theoretical foundations in organizational behavior and engineering (Simon [simon1997], March and Olsen [march1989], Zeigler [zeigler2000]), the paper delineates formal decision domains—such as regulatory compliance, clinical triage, and permit review—as problem classes where conversational authority, monolithic reasoning loops, and output-only supervision are strictly inadequate. The work both asserts and empirically supports the hypothesis that governability—the structural capacity to know and signal when a system's reasoning is not fit for autonomous execution—is a prerequisite for trust in institutional AI, distinct from conventional accuracy metrics.

Theoretical Commitments and Design Tenets

The architecture is directly derived from four formal commitments:

  1. Distinctiveness of Institutional Decisions: Institutional tasks are characterized by consequential durability, obligatory step-level explanation, role-bounded authority, and session-transcending accountability. These properties disallow architectures in which the authority, provenance, and granularity of reasoning steps are emergent or implicit.
  2. Governance Must Attach to Reasoning Structure: Governance is enforced at epistemic operation granularity; each typed cognitive primitive (e.g., classify, investigate, verify) produces artifacts with explicit, inspectable epistemic boundaries, not merely an opaque scalar of model self-confidence.
  3. Coordinated Specialization and Adaptive Sequencing: The architecture enables both strict workflow and adaptive agentic reasoning, with structural guardrails ensuring that delegation, suspension, and resumption are recorded and governed, not left as ad-hoc or implicit transitions.
  4. Endogenous Accountability: The audit trail is not a post-hoc artifact but a tamper-evident, hash-chained, fully endogenous product of execution, encoding both reasoning and the orchestration rationale at every step.

Architectural Realization: Cognitive Core

The Cognitive Core system operationalizes the preceding commitments with several interlinked innovations:

  • Cognitive Primitives: Reasoning is decomposed into a stable set of nine atomic, typed operations (retrieve, classify, investigate, verify, challenge, reflect, deliberate, govern, generate), each with domain-configurable input/output schemas. Stability of the vocabulary is demonstrated across seven institutional domains, with empirical sufficiency but without claims of completeness.
  • Reflect Primitive (Metacognitive Oversight): Uniquely, the reflect primitive operates exclusively on the accumulated epistemic state, not direct case evidence. It provides enforceable metacognitive guardrails for adjudicating whether a challenge exposes true epistemic vulnerability versus mere authority pressure or adversarial framing. This mechanism is structurally distinct from post-hoc "confidence" reporting and is shown to reduce sycophantic capitulation and critical minority suppression failure modes highlighted in Gu et al. (Gu et al., 11 Oct 2025).
  • Four-Tier Governance Model: Execution is gated by one of four governance tiers (AUTO, SPOT CHECK, GATE, HOLD), with tier escalation strictly upward and irreducible per instance. Tier assignment is a first-class computational output, not an administrative overlay.
  • Tamper-Evident Audit Chain: All primitive executions, governance actions, and orchestrator choices are recorded in a SHA-256 hash chain, forming an append-only, verifiable ledger.
  • Agentic and Workflow Modes: The architecture supports both strict workflow declaration and agentic, demand-driven delegation, constrained by hard substrate-level rules rather than soft prompts. The orchestrator’s decisions themselves are hash-chained and auditable.
  • Three-Layer Epistemic State: The overall epistemic state comprises deterministic mechanical signals, decomposed LLM-reported judgment signals, and cross-step coherence flags; critical flags can force a non-warranted status irrespective of mean confidence scores.
  • Human-in-the-Loop (HITL) State Machine: Mandatory, role-bounded reviewer intervention is encoded as part of the execution state machine, enforcing conditional authority as a substrate invariant—not merely a monitoring overlay.

Implementation and Configuration

The reference implementation (Cognitive Core v0.1.0) encapsulates these architectural requirements in a modular Python framework underpinned by DEVS event-system semantics. Domain deployment is strictly configuration-driven (workflow YAML, domain YAML, and per-case JSON), eliminating the need for new code at the framework layer when adapting to new decision problems. LLM backend selection is pluggable and per-primitive model allocation is dynamically configurable.

Comprehensive primitive and governance prompt engineering is rigorously separated: the prompt for each cognitive operation is statically registered with explicit JSON output schemas and input-structure coupling, supporting robust schema validation, forced retransitions, and substrate-level override of LLM selections when required.

Empirical Evaluation

Benchmark Construction

An 11-case prior authorization appeal benchmark was constructed with balanced representation across OVERTURN, UPHOLD, PARTIAL, and REMAND outcomes. The cases are selected specifically to probe key failure modes (e.g., approval prior bias, procedural defect blindness, per-level asymmetry, authority sycophancy).

Results

The Comparative results are as follows:

  • Cognitive Core (CC): 91% accuracy (10/11) with zero silent errors—all errors are escalated to GATE for mandatory human review. Both unique reasoning failure in CC (G003, a contested UPHOLD/REMAND ground truth case) and the universally hard case (B001) were routed to GATE, preventing unreviewed execution.
  • ReAct Baseline: 55% accuracy; five silent errors—incorrect determinations executed without review or signal.
  • Plan-and-Solve Baseline: 45% accuracy; six silent errors.

The accuracy gap is most pronounced in UPHOLD, REMAND, and PARTIAL cases—the artifact of approval prior bias and disposition commitment failures, which persist despite explicit prompt-based countermeasures. Plan-and-Solve, despite verbose (~19K char) and structurally rich planning phases, fails to eliminate this systematic disposition error.

Analysis

The architecture's critical contribution is not only higher accuracy but the structural absence of silent errors—misjudgments that are never flagged for review. The distinction between accuracy and governability is essential; the former is necessary but not sufficient for institutional trust. CC demonstrates that substrate-enforced epistemic separation, metacognitive reflection, and structurally attached governance are required to reliably surface conditions where deferment to human authority is necessary.

Limitations and Prospective Work

The primitive set, while empirically stable, is not formally proven minimal or complete; further cross-domain studies may reveal a need for refinement or subdivision (particularly regarding distinctions within the reflect class). The empirical benchmark, while sufficient for demonstrating target failure modes and architectural claims, is small and domain-specific; scaling case numbers and domain heterogeneity is a requirement for future statistical generalization.

Engineering scalability (async execution) and potential extension to more heterogeneous LLM/hybrid toolchains are future technical directions.

Theoretical and Practical Implications

Theoretical implication: Institutional AI is not a special case of general narrow or agentic AI; it requires a distinct substrate in both execution and governance semantics. The transition from output-layer "supervision" to computation-layer "governance" is essential for institutional trust and compliance.

Practically, this architectural pattern enables deployment of AI in regulated, high-stakes domains (healthcare appeals, compliance enforcement, credit adjudication, etc.) while ensuring that automated decisions remain within the bounds of institutional legitimacy, are fully inspectable, and are structurally forced to defer when epistemic or procedural uncertainty is detected.

Conclusion

This work establishes that high-stakes institutional tasks demand architectures which embed structurally governed, type-enforced epistemic decomposition, persistent metacognitive oversight, and endogenous auditability. The empirical findings demonstrate that such architecture not only increases accuracy but, more importantly, strictly limits silent error execution through substrate-level governability.

By formalizing the distinction between accuracy and governability, and by providing concrete architectural mechanisms for the latter, this work offers a foundation upon which future institutional AI deployments—and their regulatory acceptance—can be constructed and evaluated.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.