Assurance-Argument Structures

Updated 9 February 2026

Assurance-argument structures are formal, graph-based frameworks integrating claims, arguments, and evidence to justify system properties.
They decompose top-level claims into validated subclaims using quantitative metrics and risk bounds for transparent safety assurance.
Standard notations (e.g., GSN, CAE) and automated tools support modularization, traceability, and rigorous certification.

Assurance-argument structures are formalized, graph-structured frameworks that provide explicit, auditable justifications for claims regarding system properties—primarily safety, security, dependability, and, in emerging contexts, ethics or fairness—by integrating claims, supporting arguments, and diverse forms of evidence. This structured approach, codified in notations such as CAE (Claims-Argument-Evidence) and GSN (Goal Structuring Notation), supports rigorous certification, stakeholder confidence, and regulatory approval, especially for safety-critical and complex, data-driven or autonomous systems.

1. Core Components and Formalization of Assurance-Argument Structures

Assurance-argument structures organize reasoning as a directed acyclic graph whose nodes represent claims (goals), strategies (argumentation steps), evidence artifacts, and, in modern variants, explicit counterclaims (defeaters). These structures enable systematic demonstration that top-level system claims (e.g., "The probability of DDC-caused safety violation is below threshold $P_\mathrm{target}$ at confidence level $\mathrm{CL}$ ") are substantiated by a cascade of subordinate arguments, statistical estimates, operational assumptions, and empirical evidence (Kläs et al., 2022).

Key Elements:

Claims (Goals): Formal statements to be justified, e.g., "System $S$ is acceptably safe in context $C$ ."
Arguments/Strategies: Decomposition rules, inference steps, or logic connecting subclaims to parent claims (e.g., partitioning by modes of failure, use of tested functional decomposition, or eliminating alternative explanations).
Evidence Nodes: Concrete artifacts—statistical test results, formal analysis outputs, simulations, operational logs—linked to claims via argument nodes.
Contexts/Assumptions/Justifications: Explicit statements of scope, operating conditions, data semantics, assumptions taken as valid, and rationale for inference steps.
Defeaters: Explicit nodes representing doubts, threat models, or counterarguments; each must ultimately be refuted, mitigated, or accepted as residual risk (see Assurance 2.0 (Chen et al., 30 Sep 2025, Bloomfield et al., 2024)).

Modern languages (e.g., GSN, CAE, Resolute) add nesting, parameterization, and formal semantics; formal claims and evidence relationships may be encoded using propositional or predicate logic, domain-specific rules (as in contract-based arguments), or declarative programming structures for automated reasoning (Gacek et al., 2014, Murugesan et al., 2024).

2. Decomposition Patterns and Quantitative Argument Integration

The decomposition of top-level assurance claims into subordinate, independently justified subclaims is central to assurance-argument structure construction. For safety of data-driven components, an advanced quantitative pattern is as follows (Kläs et al., 2022):

Top-Level Target: $P_{\text{safe}} \leq P_{\text{target}}$ at confidence $\mathrm{CL}$ .
Four Quantitative Pillars:

1. Statistical Testing: Estimation of in-scope failure probability ( $P_{\text{test}}$ ) with upper confidence bound via methods such as the Clopper–Pearson interval. 2. Scope-Compliance Factor: Probability of encountering out-of-scope inputs ( $P_{\text{oos}}$ ); these are conservatively treated as guaranteed failures unless detected. 3. Runtime Detection (In-scope): Fraction of true in-scope failures caught at runtime ( $P_e$ ), reducing the effective contribution of residual failures. 4. Out-of-Scope Detection: Probability that runtime monitors flag an out-of-scope case ( $P_{o \to d}$ ), enabling safe handling.

Test Data Label Quality: Correction term ( $p_\mathrm{lf}$ ) for probability of label faults in test data, explicitly incorporated into the upper risk bound.
Working Equation:

$P_{\text{safe}} \leq (P_{\text{test}} + p_\mathrm{lf}) \cdot (1 - P_{\text{oos}} - P_e) + P_{\text{oos}} \cdot (1 - P_{o \to d})$

All parameters are justified at a specified confidence level ( $\mathrm{CL}$ ).

This integrated approach explicitly quantifies residual risk contribution from each operational factor and data-quality source, supporting robust, transparent safety arguments for complex, nondeterministic, or learned components (Kläs et al., 2022).

3. Representation and Tooling: Notation, Modularization, and Semantic Analysis

3.1 Representation Schemes

GSN (Goal Structuring Notation): Widely adopted for visualizing and editing assurance cases, with nodes for goals (claims), strategies, contexts, evidence, assumptions, and defeating arguments (defeaters in modern extensions).
CAE (Claims-Argument-Evidence): Explicit triplet with clear separation of claims, arguments (inference logic or decomposition), and evidence.
Domain-Specific Languages: Languages such as Resolute (Gacek et al., 2014) enable parametric, logic-based assurance case generation tied directly to architectural models, with claims as first-order predicates and rules expressed in a sequent calculus.
Contract-Based Designs: Modularization using assume-guarantee contracts, supporting composition of assurance modules and management of cross-concern dependencies (safety/security/performance), as in (McGeorge et al., 2024).

3.2 Modularization and Composition

Assurance-Argument Module: Maps to an assume-guarantee contract $C = (A, G)$ , where all $A_i$ must hold to guarantee all $G_j$ (McGeorge et al., 2024).
Proof Obligations: Each guarantee is justified by

$\bigwedge_{i\in I} A_i \implies G_j,$

forming subclaims within the module.

Compositional Patterns: Modules are composed via shared contracts and explicit integration modules, supporting parallel development and scalable verification.

3.3 Semantic and Consistency Analysis

Semantic Properties: Frameworks such as Assurance 2.0 (Murugesan et al., 2024) formalize logical requirements for indefeasibility (i.e., all claims justified, no undefeated defeaters), consistency (no contradictory properties), and adequacy (required evidence for key claims).
Automated Analysis: Export of argument graphs to Answer Set Programming (ASP) allows rule-based semantic evaluation, missing-evidence detection, and contradiction diagnosis.
Confidence and Residual Doubt: Modern structures propagate logical uncertainty and may leverage subjective probability or belief-calculus for composite confidence estimates (Chen et al., 30 Sep 2025).
Tool Support: Implementations such as ASCE, Clarissa, Resolute, and Excel-based worksheet templates facilitate structure management, traceability, and automation (Chen et al., 30 Sep 2025, Bloomfield et al., 2024, Gacek et al., 2014).

4. Extensions: Defeaters, Eliminative Argumentation, and Continuous Maintenance

Explicit Defeaters: In Assurance 2.0, defeaters are first-class argument nodes that represent explicit doubts, challenges, or negative hypotheses. Each defeater points to a claim or argument node, with the convention that unresolved defeaters block claim acceptance (Bloomfield et al., 2024).
Eliminative Argumentation: This technique attaches an exact defeater (e.g., Defeater( $\neg C$ , $C$ )) to the root claim and systematically decomposes all conceivable falsifiers for $C$ . Refuting all, via explicit subarguments or evidence, closes the case by double negation ( $\lnot\lnot C \implies C$ ).
Three-Valued Assessment: Each node in the argument graph may be assessed as True (T), Unsupported (U), or False (F), propagating semantic closure conditions and enabling rapid identification of open issues or residual risks (Bloomfield et al., 2024).
Continuous Assurance: Structures are maintained as live artefacts. Evidence updates, system changes, or external review can initiate re-analysis, preserving alignment with evolving system reality (Murugesan et al., 2024).

5. Methodological Patterns, Best Practices, and Empirical Insights

Argument Patterns: Domain-independent templates for common claims, arguments, or decompositions increase rigor and facilitate reuse (decomposition, modularization, contract-based) (Gleirscher et al., 2019).
Quantification of Uncertainty: Formalization and quantitative thresholds at all hierarchical levels support precise risk bounding. McDermid's uncertainty gradient (McDermid, 2014) enforces that increasing integrity levels systematically reduce residual epistemic and aleatory uncertainty.
Transparency and Traceability: Systematic decompositions (e.g., three-level V-Model → PDLC → 5M1E) and explicit mapping of each artifact, process, and stakeholder role assure completeness and expose gaps (Chen et al., 30 Sep 2025).
Evidence Sufficiency and Data Quality Integration: All subestimates (e.g., test-based failure rates, scope probabilities, monitor effectiveness) must be justified at the desired confidence level, with data-quality terms incorporated to correct for label faults, annotation bias, or covariate shift (Kläs et al., 2022).
Tool Chain and Automation: Integrated tool support (e.g., exporting cases to ASP for semantic analysis or managing evidence links via database-backed GSN editors) is central to effective assurance-argument management (Murugesan et al., 2024, Gacek et al., 2014).

6. Limitations, Open Challenges, and Research Directions

Despite significant advances, multiple limitations and avenues for further research persist:

Elicitation and Validation of Quantitative Inputs: Scope-probability terms (e.g., $P_\mathrm{oos}$ ) often rely on expert elicitation; systematic, auditable procedures and robust bias mitigation remain active research topics (Kläs et al., 2022).
Tool Consolidation and Industrial Validation: Toolchains that unify argument modeling, evidence integration, consistency checking, and maintenance are still rare; validation on large-scale industrial cases is limited (Mohamad et al., 2020).
Compositional Uncertainty and Robustness: Composing confidence levels or belief values while avoiding underestimation of aggregate uncertainty is a core concern (see Bonferroni corrections, Bayesian models) (Kläs et al., 2022, Chen et al., 30 Sep 2025).
Integration of Defeaters and Residual Doubt: Handling active defeaters, partial refutations, and accepted residual risks in transparent, reviewable ways is an emergent best practice but requires further automation and standardization (Bloomfield et al., 2024).
Automated and Pattern-Based Generation: Use of LLMs for pattern-based, semi-automated argument instantiation is promising but not yet at human-expert fidelity (Odu et al., 2024).
Interdisciplinary Coverage: Beyond safety, integration of assurance structures with fairness, ethics, and security requires further methodological generalization and formal adaptation.

Comprehensive assurance-argument structures, when rigorously quantified and equipped with traceability, explicit defeater handling, and tool-assisted consistency analysis, provide a scalable and transparent foundation for substantiating high-consequence claims about safety, security, and other system properties in modern AI-enabled and safety-critical domains (Kläs et al., 2022, Chen et al., 30 Sep 2025, Bloomfield et al., 2024, McGeorge et al., 2024, Murugesan et al., 2024, Mohamad et al., 2020).