Formal eXplainable AI (FXAI)

Updated 27 November 2025

FXAI is a formal framework that uses precise mathematical definitions to provide sound and minimal explanations for AI decisions.
FXAI algorithms leverage techniques such as SAT solving, hitting set enumeration, and inverse optimization to ensure verifiable and objective AI explanations.
FXAI supports applications in neuro-symbolic reasoning and medical imaging by offering rigorous, auditable frameworks that meet strict regulatory criteria.

Formal eXplainable AI (FXAI) refers to a class of rigorous methodologies, frameworks, and mathematical formalisms developed to provide sound, succinct, and objective explanations for AI and machine learning models' decisions. FXAI aims to overcome the limitations of traditional heuristic XAI approaches by supplying formally guaranteed explanations often grounded in logic, optimization, symbolic reasoning, and well-defined semantics. These explanations are intended to satisfy technical criteria such as soundness, completeness, minimality, and verifiability, facilitating regulatory compliance and quality control in high-stakes applications.

1. Formal Definitions and Core Principles

FXAI is defined by the use of precise mathematical structures to articulate what constitutes a valid explanation in a machine learning context. Central formal notions include:

Abductive Explanation: For a model output or query $Q$ in a context $\langle KB, F \rangle$ , a minimal subset $E \subseteq F$ is an explanation if $KB \cup E$ is satisfiable, entails $Q$ , and no strict subset $E' \subset E$ does so. This definition ensures consistency, entailment, and minimality, providing a "sufficient reason" for the outcome (Paul et al., 18 Oct 2024).
Formal Feature Attribution (FFA): The importance of feature $i$ is measured as the fraction of all minimal abductive explanations (AXp's) containing $i$ , yielding a feature importance grounded in actual reasoning about the decision (Yu et al., 2023, Yu et al., 2023).
Duality and Hitting Set: FXAI exploits the mathematical duality between minimal sufficient (abductive) and minimal necessary (contrastive) explanations; the set of all AXp’s is the minimal hitting set family over all CXp’s and vice versa (Yu et al., 2023).
Hierarchical and Modular Structure: Explanations can be decomposed into layered forms, such as symbolic reasoning step followed by neural input explanation, allowing scalable and interpretable explanations for neuro-symbolic systems (Paul et al., 18 Oct 2024).

FXAI frameworks further specify objective correctness criteria, often requiring that only features with statistical or causal association to the outcome may be assigned nonzero importance—contrasting with common XAI methods which systematically violate these conditions (Haufe et al., 22 Sep 2024).

2. Computational Frameworks and Algorithms

A distinguishing feature of FXAI is the direct use of algorithmic and verification-based processes to construct explanations:

Symbolic and Neuro-symbolic Explanation Algorithms: Minimal abductive explanations can be computed via hitting set enumeration or iterative SAT solving, ensuring minimality through explicit checks after tentative feature removal. Hierarchical approaches enable concise reasoning in neuro-symbolic setups (Paul et al., 18 Oct 2024).
Feature Attribution Enumeration: MARCO-style algorithms and their adaptive extensions enumerate minimal explanations (AXp's) and contrastive sets (CXp’s) to approximate FFA, leveraging anytime enumeration and statistical convergence bounds for tractable approximation in #P-hard settings (Yu et al., 2023, Yu et al., 2023).
Prototype-based Latent Explanation: In prototype networks, formal abductive latent explanations (ALEs) provide sufficient activation bounds in latent space that guarantee the prediction, verified via solver-free, backward-minimality algorithms and activation-space geometric reasoning (triangular/hypersphere paradigms) (Soria et al., 20 Nov 2025).
Inverse Optimization and Learning to Optimize (L2O): Each inference is cast as a data-driven constrained optimization problem, encoding prior knowledge and physical constraints for explainable inference, with certificates assigning trustworthiness labels to inferences based on empirical distributions (Heaton et al., 2022).

Typical FXAI algorithms balance worst-case exponential complexity (for exact enumeration) against empirically efficient heuristics, anytime approximations, and formal stopping criteria.

3. Application Modalities and Representative Use Cases

FXAI provides a suite of modalities addressing explanation demands in diverse contexts:

Neuro-symbolic FXAI: Two-level explanations combine symbolic abductive reasoning (over facts detected by a neural module) and modular neural input explanations (pixel masks, feature subsets), successfully yielding orders-of-magnitude more succinct and faithful explanations than purely neural methods, with empirical superiority in reasoning tasks (Paul et al., 18 Oct 2024).
Latent Space Explanation for Prototypical Networks: Abductive latent explanations ensure formal sufficiency and minimality of prototype contributions. Algorithmic comparison across solution paradigms quantifies trade-offs between explanation compactness and computational tractability in image classification (Soria et al., 20 Nov 2025).
Probabilistic Logic-based Explanation: Probabilistic linear programming over clause-based logical knowledge bases enables extraction of decisive feature subsets in conjunction with robust classification, supporting formal reasoning and consistency in explainable prediction systems (Fan et al., 2020).
Medical Imaging via Bayesian Teaching: FXAI formulated through Bayesian Teaching selects informative example sets that induce human learners to replicate AI inference, aligning cognitive model update with AI decisions and supporting calibrated trust in expert users (Folke et al., 2021).

4. Theoretical Guarantees and Criteria for Explanation Correctness

Leading FXAI methods explicitly state and often prove semantic guarantees:

Soundness: Every (formal) explanation or attribution is constructed so that, whenever the explanation is presented, the model output is guaranteed to remain unchanged. This is enforced through logical entailment, constrained minimization, or activation bounding (Paul et al., 18 Oct 2024, Soria et al., 20 Nov 2025).
Minimality: Enumerated explanations or subsets are subset-minimal, precluding redundant or unnecessary elements and making them human-auditable (Soria et al., 20 Nov 2025, Yu et al., 2023).
Succinctness: Hierarchical or bundled explanations reduce explanation complexity by abstraction (e.g., grouping pixels to superpixels), increasing interpretability (Bassan et al., 2022).
Correctness and Association: Recent FXAI critiques insist on statistical correctness criteria, forbidding attribution to features not demonstrably associated with the generative target. Explicit metrics and counterexamples demonstrate the failures of popular methods to meet these standards (Haufe et al., 22 Sep 2024).

Certificates, trust labels, and convergence bounds are also found in optimization-based FXAI frameworks, further supporting robust quality assurance (Heaton et al., 2022).

5. Limitations, Open Problems, and Future Directions

Despite its advantages, FXAI remains subject to significant computational and conceptual challenges:

Complexity: Exact enumeration of explanations and attributions is often #P-hard, requiring exponential time even with duality exploitation. Approximation strategies and heuristic search are essential for scalability (Yu et al., 2023).
Interpretability and Succinctness: Explanations may remain large and unwieldy, especially in high-dimensional or prototype-based models, challenging the goal of human comprehensibility (Soria et al., 20 Nov 2025, Bassan et al., 2022).
Faithfulness and Symbolic/Neural Interplay: FXAI depends on the fidelity of the underlying model (e.g., N(x)→F in neuro-symbolic XAI); symbolic reasoning cannot compensate for upstream misclassifications. Neural sub-explanations may inherit instability from black-box methods (Paul et al., 18 Oct 2024).
Formal Benchmarking: The lack of well-defined, ground-truth datasets for explanation evaluation impedes theoretical and empirical benchmarking of explanation correctness. Calls for new processes and metrics are ongoing (Haufe et al., 22 Sep 2024).
Extension to Modern Architectures: Current techniques are being extended to deal with CNNs, transformers, and richer abstraction-refinement processes in verification systems (Bassan et al., 2022).
Ontological and Categorical Foundations: Recent work applies category theory and institution theory to articulate foundational aspects of FXAI, modeling explanations as functors or semantic objects, promoting compositionality and secure deployment (Barbiero et al., 2023).

6. Taxonomy and Unifying Theories

Categorical foundations formalize the relationships between various paradigms of explainable AI, offering:

Institutional Semantics: Explanations are conceptualized as institutional objects—syntactic (sentences) or semantic (models)—with morphisms and natural transformations enforcing correct passage across abstraction boundaries (Barbiero et al., 2023).
Pipeline Compositionality: FXAI pipelines can be described as compositions of functors in feedback Cartesian categories, enabling verifiable, auditable, and traceable explanation flows.
Taxonomy Construction: Slicing the functor category along axes such as intrinsic vs post-hoc explanation, model-agnostic vs model-specific design, or forward vs backward explanation computation yields a natural and rigorous taxonomy encompassing all major XAI methods (Barbiero et al., 2023).

This unifying view facilitates theoretical equivalence proofs, compositional security properties, and principled ethical deployment of explainable AI systems.

FXAI represents a shift from ad hoc, heuristic explanation techniques toward mathematically grounded, algorithmically tractable, and formally verifiable approaches in explainable machine learning. It delivers modular, minimal, and, where possible, human-auditable explanations, supporting regulatory needs and the development of trusted AI applications across domains.