Compliance-by-Construction Argument Graphs: Using Generative AI to Produce Evidence-Linked Formal Arguments for Certification-Grade Accountability

Published 5 Apr 2026 in cs.AI | (2604.04103v1)

Abstract: High-stakes decision systems increasingly require structured justification, traceability, and auditability to ensure accountability and regulatory compliance. Formal arguments commonly used in the certification of safety-critical systems provide a mechanism for structuring claims, reasoning, and evidence in a verifiable manner. At the same time, generative artificial intelligence systems are increasingly integrated into decision-support workflows, assisting with drafting explanations, summarizing evidence, and generating recommendations. However, current deployments often rely on LLMs as loosely constrained assistants, which introduces risks such as hallucinated reasoning, unsupported claims, and weak traceability. This paper proposes a compliance-by-construction architecture that integrates Generative AI (GenAI) with structured formal argument representations. The approach treats each AI-assisted step as a claim that must be supported by verifiable evidence and validated against explicit reasoning constraints before it becomes part of an official decision record. The architecture combines four components: i) a typed Argument Graph representation inspired by assurance-case methods, ii) retrieval-augmented generation (RAG) to draft argument fragments grounded in authoritative evidence, iii) a reasoning and validation kernel enforcing completeness and admissibility constraints, and iv) a provenance ledger aligned with the W3C PROV standard to support auditability. We present a system design and an evaluation strategy based on enforceable invariants and worked examples. The analysis suggests that deterministic validation rules can prevent unsupported claims from entering the decision record while allowing GenAI to accelerate argument construction.

Abstract PDF Upgrade to Chat

Authors (1)

Mahyar T. Moghaddam

Summary

The paper proposes a compliance-by-construction pipeline that enforces certification-grade accountability by separating generative drafting from deterministic validation.
It integrates retrieval-augmented generation with typed formal argument graphs to ensure every claim is supported with admissible evidence and complete procedural coverage.
The framework mitigates risks of AI hallucination by logging full provenance and enforcing strict validation rules, thereby enabling audit-ready decision processes.

Compliance-by-Construction Argument Graphs for Generative AI: Architecture, Guarantees, and Implications

Motivation and Context

Generative AI's adoption in regulated, high-stakes contexts introduces significant risks regarding unsupported, unverifiable, or ambiguous reasoning artifacts. In certification-driven environments—particularly for safety-critical or public sector CPS/IoT contexts—a formal mechanism for ensuring claims, arguments, and evidence are both rigorous and auditable is essential. The paper "Compliance-by-Construction Argument Graphs: Using Generative AI to Produce Evidence-Linked Formal Arguments for Certification-Grade Accountability" (2604.04103) proposes an architecture that synthesizes retrieval-augmented generation (RAG), formal argument graph representations, and a deterministic validation framework to enforce certification-grade requirements on AI-assisted workflow artifacts.

Architectural Overview

The central construct is the Compliance-by-Construction pipeline, in which GenAI is used exclusively as a drafting assistant. All contributions from AI are strictly admitted into the certified argument record only if they pass a set of formal, machine-checkable validation constraints. This approach ensures that the inherently probabilistic nature of LLMs is constrained and shielded from propagating unsupported or weakly grounded claims into critical decision logs.

The architecture consists of four core components:

Typed Argument Graphs: A formal representation inspired by goal structuring notation (GSN) and argumentation models. Nodes are typed as claims, rules, evidence, assumptions, or inference strategies.
Retrieval-Augmented Generation (RAG): Drafting of argument fragments is guided by authoritative, retrieved evidence; each claim must reference actual records or be explicitly marked as unsupported (requiring human review).
Reasoning and Validation Kernel: A deterministic component that enforces validation predicates such as evidence completeness, admissibility, rule/procedural coverage, local non-contradiction, and provenance completeness.
Provenance Ledger (W3C PROV-Aligned): All artifacts, model calls, data flows, and editorial interventions are logged within a provenance model, enabling audit-level reconstruction of the entire reasoning pipeline.

This architecture is layered for modularity and traceability, as shown in (Figure 1).

(Figure 1)

Figure 1: Layered architecture for compliance-by-construction argument generation.

Typed Argument Graph Design and Validation Constraints

Argument Graphs (AGs) are instantiated as typed, directed multigraphs. The node typing system encodes the distinction between claims (requiring verification), policies/rules (formal logical/rule-based constraints), evidence (admissible factual source references), assumptions, inference strategies, and qualifiers for uncertainty.

Key design constraints enforced in the validation kernel include:

Evidence Completeness: Every claim must be supported by at least one admissible evidence node or be explicitly flagged as an assumption (which always necessitates human intervention for acceptance).
Evidence Admissibility: All evidence must originate from authorized, context-appropriate sources in compliance with data governance policies.
Procedural Coverage: For any class of top-level claim, required subclaims and procedural steps (stemming from relevant policies/regulations) must be present, eliminating argument incompleteness.
Explicit Contradiction Handling: Any conflict or rebuttal among claims must be modeled explicitly as attack/exception nodes, not hidden or omitted.
Provenance Completeness: Every AI-generated node is linked to detailed context: model/version, prompt/template, retrieval set, human edits, and runtime environment.

This design ensures that AI interventions are rendered transparent, reviewable, and defensible, transforming LLM contributions from opaque suggestions into structured, regulated candidate artifacts.

System Workflow

The workflow is organized as follows: structured case data leads to construction of a case knowledge graph, which is then used to drive retrieval of admissible evidence relevant to the current reasoning steps. GenAI constructs a candidate argument graph segment, subject to a defined schema. This candidate is then evaluated by the validation kernel against active constraints; if the artifact fails any check, feedback is generated and a repair/redraft loop is initiated. Upon validation success, the artifact and its provenance are persisted for eventual auditor consumption.

This pipeline strictly separates probabilistic proposal (GenAI) from deterministic acceptance (validator), allowing AI to accelerate artifact structure and content generation but delegating all critical admission decisions to machine-verifiable, policy-aligned criteria.

Architectural Guarantees

Deterministic Invariants

The system establishes several invariants not as statistical properties, but as design-level guarantees:

Unsupported claims cannot enter certified records. The validation kernel acts as a hard barrier against inclusion of any AI-generated text that is not grounded in retrievable, admissible evidence or explicitly authorized as an assumption.
Procedural and evidentiary completeness is enforced. Argument artifacts that omit required claims, subclaims, or sources are systematically rejected and forced through a repair process.
Full provenance. Each decision artifact carries a detailed, standards-aligned provenance graph, supporting post hoc reconstruction of all contributing agents, actions, and inputs required for certification or legal audit.
Explicit handling of uncertainty and contradiction. The architecture does not suppress or ignore ambiguity; rather, all uncertainty or conflict is modeled explicitly in the argument structure.

Example Argument Graph Instantiation

A typical argument constructed by the system features hierarchical decomposition of top-level compliance claims into finer-grained subclaims, each reference-linked to supporting evidence excerpts, with dashed assumption nodes marking unverifiable dependencies (see Figure 2 in the original manuscript).

Such structured outputs demonstrate that AI-based artifact generation can be rendered certifiable and audit-ready if appropriate gating and provenance are systematically applied. The ability to traverse provenance relations allows external auditors, regulators, and certifiers to reconstruct the derivation of any approved claim.

Practical, Regulatory, and Theoretical Implications

Practical Integration

The architecture supports integration in high-risk, regulated workflows (e.g., benefits eligibility, judicial decisions, or medical safety arguments).
The system is directly aligned with regulatory trends such as the EU AI Act (2024/1689), which mandate evidence-backed, reviewable, and auditable decision-making in high-risk AI-enabled systems.
By operating on formal, typed artifacts rather than unstructured LLM text, human auditors gain both tractability of review and meaningful routes for contestation or override.

Addressing Key Risks in GenAI

The central reliability risk in LLMs—hallucination and unsupported inferences—is mitigated not by expectation management or trust calibration, but by design-time enforcement of argument constraints. Outputs that fail to meet procedural, evidentiary, or provenance requirements are categorically excluded from certified decision packages.

Theoretical Extensions

The separation of drafting/generation from acceptance/validation creates a new paradigm for institutional use of GenAI: generation is reconceived as a "first proposal," always subject to rejection, revision, or supplementation per deterministic rules.
The provenance model anchors AI auditing in wider data governance and reviewability frameworks, connecting technical and regulatory compliance.
Extensions of this architecture could address adversarial scenarios (explicit modeling of attacks/counter-arguments), richer modeling of uncertainty propagation, and complex policy specification languages, enabling greater expressivity in representing contested or value-laden domains.

Limitations and Future Directions

Encoding domain- and regulation-specific constraints as machine-checkable policies can be cognitively and technically intensive. Ensuring that these constraints fully capture both procedural requirements and the broader context or intent of norms (such as justice or equity) remains a research challenge. While the current architecture achieves strong technical guarantees on evidence and process linkage, the meta-level legitimacy and bias of the underlying rules are beyond its direct scope and necessitate complementary social and governance mechanisms.

Anticipated developments include expansion to more expressive argumentation frameworks, advanced uncertainty quantification, and integration with federated learning and privacy-preserving mechanisms for cross-institutional workflows.

Conclusion

The compliance-by-construction argumentation pipeline described in this work (2604.04103) provides a concrete methodology and system for embedding GenAI into certification-grade, audit-ready decision systems. By enforcing strict separation between drafting and acceptance, and by requiring formal argument constraints and complete provenance, the architecture offers strong guarantees of accountability, traceability, and human oversight in high-stakes AI workflows. This design reconfigures GenAI from an unaccountable assistant to a regulated proposal engine, with all final outputs subject to deterministic validation before institutional persistence. The result is a path toward integrating generative models into regulated domains without weakening core safeguards of evidence, due process, and auditability.

Markdown Report Issue