Explainable AI Methods Overview

Updated 25 December 2025

Explainable AI methods are a suite of techniques that render machine learning models transparent and accountable by converting black-box predictions into understandable explanations using approaches like LIME and SHAP.
These methods span model-agnostic, gradient-based, and counterfactual frameworks, offering both local and global interpretability to assess fairness, trustworthiness, and compliance.
Emerging research focuses on scalable, causal, and user-adaptive techniques to address challenges like feature dependency and counterfactual feasibility in high-stakes settings.

Explainable Artificial Intelligence (XAI) constitutes a suite of conceptual foundations, algorithmic methodologies, and evaluation strategies intended to render AI and machine learning models transparent, interpretable, and accountable to human stakeholders. XAI methods serve diverse roles, from auditing black-box predictions for trustworthiness, fairness, and accountability, to facilitating regulatory compliance and actionable recourse in high-stakes decision settings. This overview delineates the principal XAI frameworks, core mathematical formulations, and current research directions with emphasis on technical rigor and cross-domain applicability.

1. Taxonomies and Conceptual Foundations

Multiple orthogonal axes structure the contemporary XAI landscape:

Model-agnostic vs. Model-specific: Model-agnostic methods interface with predictive functions only via input-output queries, making them broadly applicable (e.g., LIME, SHAP); model-specific methods exploit architectural details (e.g., gradients in CNNs, attention maps in transformers) (Gohel et al., 2021).
Post-hoc vs. Ante-hoc: Post-hoc methods explain black-boxes after training, while ante-hoc (inherently interpretable) models (e.g., decision trees, linear/logistic regression, GAMs) are designed for end-to-end transparency (Mumuni et al., 17 Jan 2025, Islam et al., 2021).
Local vs. Global: Local methods explain single-instance decisions; global methods elucidate the model’s aggregate behavior over an input domain (Islam et al., 2021).
Descriptive vs. Predictive vs. Causal/Explanatory: The “Describe–Predict–Explain” framework rigorously distinguishes between (i) pattern-recovery (descriptive, e.g. SHAP/LIME), (ii) risk-forecasting (predictive), and (iii) counterfactual/causal explanations that answer “what if” or “why” questions (interventional, e.g. Average Causal Effect estimation) (Carriero et al., 7 Aug 2025).

2. Major XAI Methodologies

2.1 Feature Attribution and Local Surrogate Approaches

SHAP (Shapley Additive Explanations): Computes the additive contribution of each feature to a prediction, leveraging the Shapley values from cooperative game theory (Salih et al., 2023, Islam et al., 2021). For instance:

$\phi_j(x) = \sum_{S\subseteq N\setminus\{j\}} \frac{|S|!\,(|N|-|S|-1)!}{|N|!}\,[f_{S\cup\{j\}}(x) - f_S(x)]$

Exact computation is $O(2^n)$ ; Kernel SHAP provides a practical approximation. For trees, TreeSHAP achieves polynomial time. Local explanations are instance-specific; global importances are obtained by aggregating $|\phi_j|$ across data (Salih et al., 2023, Hsieh et al., 1 Dec 2024).

LIME (Local Interpretable Model-agnostic Explanations): Fits an interpretable surrogate $g$ (usually sparse linear) to locally approximate $f$ near a sample $x$ via weighted least squares:

$\hat{g} = \arg\min_{g\in G}\sum_{z\in Z}\pi_x(z)[f(z) - g(z)]^2 + \Omega(g)$

Here $\pi_x(z)$ is a kernel on instance proximity; $Z$ are stochastically perturbed samples (Speckmann et al., 26 Jun 2025).

Surrogate Models: Globally or locally fit interpretable functions ( $f̃$ , e.g. small trees, rule lists) to replicate complex predictor outputs. Fidelity to $f$ must be evaluated; surrogacy may obscure true model logic (Islam et al., 2021).
Anchors: High-precision, rule-based local explanations. Anchors are minimal feature sets that, when held fixed, keep the prediction invariant with high probability (Speckmann et al., 26 Jun 2025).

2.2 Gradient-Based and Model-Specific Attributions

Saliency Maps and Grad-CAM: Saliency methods compute $S(x) = |\nabla_x f(x)|$ ; Grad-CAM combines gradients with feature maps to produce class-specific visual heatmaps:

$L^c = \mathrm{ReLU}\left(\sum_k \alpha_k^c A^k\right), \quad \alpha_k^c = \frac{1}{Z} \sum_{i, j}\frac{\partial y^c}{\partial A^k_{ij}}$

These are suited to vision domains but have limited causal fidelity (Brandt et al., 2023, Hsieh et al., 1 Dec 2024).

Integrated Gradients (IG): Attributes output differences to features by integrating gradients along a path from a baseline $x'$ :

$\mathrm{IG}_i(x) = (x_i - x_i') \int_{0}^1 \frac{\partial f(x'+\alpha(x-x'))}{\partial x_i} d\alpha$

Satisfies completeness; computationally expensive for high-dimensional and structured data (Salih et al., 2023, Hsieh et al., 1 Dec 2024).

2.3 Counterfactual and Contrastive Explanations

Counterfactual Explanation: Finds $x'$ near $x$ such that $f(x')=y'$ for some target class $y'$ :

$\min_{x'}\; d(x, x') + \lambda \mathcal{L}(f(x'), y')$

Solutions illuminate actionable paths to desired outcomes but may be non-unique and optimization may require proper domain constraints (Speckmann et al., 26 Jun 2025, Gohel et al., 2021, Carriero et al., 7 Aug 2025).

Formal "Why/Why Not" Explanations: Minimal sufficient (AXp) and minimal contrastive (CXp) feature sets defined by logical constraints, with support for background domain knowledge $\varphi$ to make explanations succinct and domain-relevant (Yu et al., 2022).

2.4 Probabilistic Logic and Certificate-driven Explanations

Probabilistic Logic Inference: Symbolic knowledge bases extracted from data are used with linear programming to yield not only probabilistic predictions but also minimal decisive-feature explanations. This approach often aligns with SHAP on real datasets but can outperform in ground-truth-controlled synthetic regimes (Fan et al., 2020).
Learn-to-Optimize (L2O): Each inference is the solution of a transparent, data-driven optimization problem encoding priors and constraints. Outputs are annotated with "certificates" (e.g., for sparsity, fidelity, convergence) to verify trustworthiness (Heaton et al., 2022).

3. Evaluation Metrics and Benchmarks

Rigorous assessment protocols seek to quantify fidelity, completeness, stability, and human comprehensibility:

Precision/Recall of Attribution: Especially in synthetic ground-truth settings, precision and recall are computed over positively and negatively contributing inputs (Brandt et al., 2023). E.g., for ground-truth attributions $GT(p)$ and candidate $E(p)$ :

$\mathrm{Precision}^+ = \frac{\sum_{p \in P^+}[1 - |E(p) - GT(p)|]}{\sum_{p \in P^+}[1 - |E(p) - GT(p)|] + \sum_{p \notin P^+, E(p)>0}|E(p)-GT(p)|}$

Compactness, Completeness, Correctness: Respectively, these equate to precision, recall, and their average.
Model-agnostic interpretability proxies: E.g., the Molnar–Islam score: $\Psi = 1 - [w_1 C_\text{chunks}(x) + w_2 C_\text{chunks}(\hat{y}) + w_3 \text{Interaction}]$ , with higher values indicating greater explainability (Islam et al., 2019, Islam et al., 2021).
Simulatability and human-study metrics: Evaluate whether users can predict outcomes or recourse given an explanation, and measure subjective comprehensibility (Islam et al., 2019).

4. Limits and Cautions in Application

Descriptive vs. Causal Confounds: Most XAI methods (SHAP, LIME, feature-importance, counterfactuals) are descriptive; their attributions reflect association on the training distribution, not underlying causal mechanisms. Misinterpreting these for actionable interventions can be misleading, particularly in healthcare and high-stakes decision environments (Carriero et al., 7 Aug 2025).
Feature Dependencies and Instabilities: SHAP and LIME are sensitive to feature collinearity; their additive and independence assumptions can induce both instability and unintuitive attributions when features are redundant or strongly correlated (Salih et al., 2023).
Global Surrogacy Illusions: Global surrogate models may yield compact summaries with misleading fidelity; local surrogates avoid this at the cost of generality (Islam et al., 2021).
Unrealistic Counterfactuals: Counterfactual optimization may suggest infeasible or semantically meaningless perturbations unless domain constraints are explicitly encoded (Speckmann et al., 26 Jun 2025, Carriero et al., 7 Aug 2025).

5. Integration, Tooling, and Stakeholder Adaptation

Comprehensive XAI interfaces combine multiple explanation modes and are increasingly tailored by user type:

Interactive Platforms: Modern systems (e.g., IXAII, OmniXAI) unify LIME, SHAP, counterfactuals, rule-based Anchors, and certificate-driven methods, with tunable parameters and rich visualizations for data scientists, managers, auditors, lay users, and affected third parties (Speckmann et al., 26 Jun 2025, Yang et al., 2022).
Holistic Workflow Integration: The HXAI and H-XAI frameworks extend explainability beyond model output to encompass the entire ML pipeline: raw data diagnostics, analysis setup rationales, learning process transparency, model quality and error slicing, and communication channels orchestrated by AI agents (including LLM-based aggregation and explanation) (Paterakis et al., 15 Aug 2025, Lakkaraju et al., 7 Aug 2025).
Stakeholder-Specific Mapping: Algorithmic and narrative explanations must adjust for domain experts, analysts, clinicians, and the public—ranging from technical attributions and fairness audits to plain-language recourse or “why-not” narratives (Paterakis et al., 15 Aug 2025, Lakkaraju et al., 7 Aug 2025, Speckmann et al., 26 Jun 2025).

6. Challenges and Ongoing Research Frontiers

Causal XAI: Shifting from associational to interventional/structural explanations—integrating SCMs, do-calculus, and causal inference for actionable recourse (Carriero et al., 7 Aug 2025).
Faithfulness and Benchmarks: Inconsistency among faithfulness metrics (e.g., MOAR, ROAR, insertion/deletion, compactness) calls for standardized evaluation protocols and robust ground-truth datasets, especially as LLMs and VLMs enter the XAI pipeline (Brandt et al., 2023, Mumuni et al., 17 Jan 2025).
Scaling and Automation: Efficient approximations such as TreeSHAP, learning-based approximators, automated extraction of domain rules (MaxSAT induction), and LLM/VLM-guided concept bottleneck construction are advancing the scalability and semantic richness of explanations (Yu et al., 2022, Mumuni et al., 17 Jan 2025).
User-Centered, Adaptive XAI: Orchestration of pipeline-wide explanations by LLM-powered agents and user-adaptive interfaces seeks to bridge the communicative barriers between technical and non-technical stakeholders, ensuring cognitive manageability and actionable insight (Paterakis et al., 15 Aug 2025, Lakkaraju et al., 7 Aug 2025).