Explainable AI (xAI) Models
- Explainable AI models are frameworks that transform opaque AI decisions into human-understandable insights by decomposing evidence and interpretation.
- They are applied in fields like biomedicine, finance, and aerospace to enhance transparency, facilitate regulation, and build user trust.
- xAI approaches balance dual criteria—faithfulness and plausibility—by mapping raw model artifacts to contextualized explanations validated through testing.
Explainable Artificial Intelligence (xAI) models are systems and methodologies designed to render the decision processes of complex AI models intelligible to human stakeholders. These approaches separate the technical inner logic of model computation from the human-facing act of rendering explanations, operationalize dual criteria—faithfulness to the model’s operations and plausibility to end users—and ground the comparison and evaluation of explanation methods in formal properties. While originally motivated by safety and regulatory demands in domains such as biomedicine, finance, and aerospace, contemporary xAI also faces calls for democratized, human-centered explanation that serves both technical experts and lay audiences (Rizzo et al., 2022, Mumuni et al., 17 Jan 2025, Zorita et al., 23 Dec 2024).
1. Formal Definition and Structural Framework
A principled xAI model decomposes every explanation into two core components (Rizzo et al., 2022):
- Evidence (): Objective information extracted from the trained model , query input , and/or the model’s output , via an extractor , i.e., .
- Interpretation (): A function that maps evidence (and possibly , , ) to a human-understandable explanation, .
This separation exposes the dual “atomic” building blocks of any explanation, distinguishing raw model artifacts from the mapping that contextualizes them for human understanding.
Central to this account is explanatory potential (), which quantifies the proportion of the model’s decision chain or internal transformations that the evidence can, in principle, illuminate. Breadth (number of covered steps) and depth (extent to which each step is constrained) jointly compose .
2. Faithfulness and Plausibility: Dual Quality Criteria
The two classical desiderata—faithfulness and plausibility—are operationalized in the evidence–interpretation architecture (Rizzo et al., 2022):
- Faithfulness (): For each internal transformation step that claims to explain, faithfulness is the scalar measuring how accurately reflects the true function learned at step . It is only nonzero if the evidence possesses nonzero explanatory potential for . Overall faithfulness aggregates as
and is measured via requirement-based testing (e.g., feature occlusion, sensitivity analysis).
- Plausibility (): A user-dependent assessment of whether the explanation is convincing/intelligible. It comprises:
- Human-understandability: Can the user parse the explanation?
- Informativeness (depth): Does the explanation explore the most relevant steps?
- Completeness (width): Does the explanation cover sufficient aspects of the model to be useful for generalization or anticipation?
Plausibility is typically evaluated by user studies or expert surveys, especially in high-stakes domains.
3. Principal XAI Model Classes and Case Study Mapping
xAI methods can be categorized by properties including ante-hoc versus post-hoc operation, model-specificity, and the granularity (global vs local) of explanations (Kok et al., 2022, Zhou et al., 2023, Mumuni et al., 17 Jan 2025).
Intrinsic (ante-hoc) models are interpretable by design:
- Linear Regression: , with evidence , full explanatory potential , and faithfulness by construction.
- Rule-Based/Fuzzy Systems: Human-readable rules with , ; typically high plausibility among practitioners.
Post-hoc XAI methods provide explanations for black-box models. Notable techniques include:
- Attention Mechanisms: Expose learned token or region importances. Empirical studies have found low faithfulness to real causal pathways despite high plausibility (Rizzo et al., 2022).
- Grad-CAM and Saliency Maps: Visual highlight maps from gradients or convolution outputs; faithfulness is measured by occlusion tests and correlation with output changes.
- SHAP (Shapley Additive Explanations): Computes feature attributions via the Shapley value formula,
providing high faithfulness by satisfying local accuracy, consistency, and other axioms, though sometimes low plausibility if too many features are nonzero.
- LIME (Local Interpretable Model-Agnostic Explanations): Fits a local surrogate (typically sparse linear) around an instance; faithfulness limited to the locality of the fit.
- Layer-wise Relevance Propagation (LRP): Backpropagates relevance through each layer, yielding fine-grained attribution heatmaps; faithfulness and potential depend on propagation rules.
Table 1: Mapping of Representative XAI Methods to the Evidence–Interpretation Framework
| Method | Evidence () | Explanatory Potential () | Faithfulness () |
|---|---|---|---|
| Linear | Regression coefficients | $1$ (full model) | $1$ (algebraic identity) |
| SHAP | Marginal outputs over | Fraction of coalitions sampled | High (axiomatic guarantee) |
| feature coalitions | |||
| LIME | Local surrogate weights | Fraction of neighborhood covered | Moderate; local only |
| Grad-CAM | Feature map gradients | Fraction of conv. layer params | Varies; tested by occlusion |
| Attention | Attention weights | Single layer; limited global coverage | Often low by intervention tests |
4. Design, Evaluation, and Selection in Practice
The evidence–interpretation formalism allows systematic design, quantitative evaluation, and rational selection of xAI approaches (Rizzo et al., 2022). Specifically:
- Design: Select evidence with high explanatory potential relevant to decision points of interest. Architect models to facilitate tractable evidence extraction and high-fidelity explanatory mapping.
- Evaluation: Faithfulness can be quantitatively tested (e.g., feature occlusion drops), while plausibility requires user studies. Trade-offs are observed—maximizing explanatory coverage () often increases cognitive load, possibly lowering plausibility ().
- Comparison: Methods are compared as tuples . For instance, SHAP may offer high , but lower in data-rich domains; Grad-CAM may be moderate in , and high in for image specialists.
In domain-specific deployments such as biomedicine, explainability should be defined and enforced as a non-functional system requirement, with explicit tests for explanatory potential and faithfulness, piloted in situ with real end users.
5. Audience Adaptation and Human-Centered Explanations
Recent work highlights the necessity of tailoring explanations for experts, generalists, and non-expert stakeholders. Approaches such as x-[plAIn] use LLMs conditioned on audience profiles to adapt the style, detail, and mathematical depth of explanations (Mavrepis et al., 23 Jan 2024). These models introduce “audience embeddings” selecting the degree of formulaic transparency and automate translation of raw XAI outputs (SHAP, LIME, Grad-CAM) into audience-specific, consumable rationales. Empirical studies demonstrate improved comprehension accuracy and reduced interpretation time, especially for non-expert users.
Parallel developments in “human-centered” xAI frameworks leverage structured dual-output templates: a machine-readable (“expert”) explanation (feature importances, rules) and a narrative (“non-expert”) explanation. This stratification, validated by correlation with ground-truth XAI methods and user studies, bridges the gap between algorithmic transparency and practical interpretability (Paraschou et al., 13 Jun 2025).
6. Limitations, Best Practices, and Open Research Challenges
Despite substantial progress, critical challenges remain:
- Evaluation Deficit: A scoping review found that 81% of published applications that label themselves “explainable” do not evaluate the quality or suitability of their chosen XAI method (Mainali et al., 2023). Explanation fidelity—how well the surrogate or attribution matches the underlying model—should be explicitly measured.
- Faithfulness vs. Plausibility Tension: High plausibility (user satisfaction, surface intuitiveness) can occur without faithfulness; empirical failures of attention as an attribution method exemplify this pitfall (Rizzo et al., 2022).
- Contextualization and Human-in-the-Loop: Explanation needs and appropriate metrics differ by user, domain, and task. Best practices demand contextual validation, mixed quantitative–qualitative evaluation, and rigorous, reproducible reporting aligned with community standards (Mainali et al., 2023).
- Hybrid and Adaptive Explanations: Combining multiple methodologies—model-specific and model-agnostic, local and global—can yield more complete, trust-calibrated explanations (Kok et al., 2022, Zhou et al., 2023).
- Open Problems: There is no consensus on universally applicable faithfulness metrics or semantic ground truths for explanation, and further research is needed on adversarial robustness and user-centered explanation interfaces (Kok et al., 2022, Mainali et al., 2023).
7. Application Domains and Impact
In safety-critical and regulated sectors, xAI frameworks have been applied to:
- Biomedicine: For diagnosis support, risk scoring, and biomarker discovery, with explicit balancing of faithfulness, explanatory potential, and user plausibility (Rizzo et al., 2022).
- IoT and Smart Infrastructure: To pinpoint influential sensors or control signals, using SHAP/LIME for feature attribution and Grad-CAM for spatial localization (Kok et al., 2022).
- Bioinformatics: Combining model-agnostic and introspective methods to interpret deep sequence, structure, and image models (Zhou et al., 2023).
- Aerospace: Facilitating error analysis, trust, and control in air traffic management and predictive maintenance via hybrid transparent and post-hoc explanation methods (Zorita et al., 23 Dec 2024).
Successful deployment requires domain-specific benchmarking, continuous evaluation against evolving user needs, and a layered portfolio of interpretable model forms and post-hoc explainers.
References:
- (Rizzo et al., 2022)
- (Mavrepis et al., 23 Jan 2024)
- (Mainali et al., 2023)
- (Kok et al., 2022)
- (Zhou et al., 2023)
- (Zorita et al., 23 Dec 2024)
- (Paraschou et al., 13 Jun 2025)