Local Explanation Techniques
- Local explanation techniques are a set of XAI methods that locally approximate complex models by fitting simple, interpretable surrogate models around a target instance.
- They employ robust perturbation strategies, such as KDE sampling and N-ball uniform sampling, to generate realistic data points and enhance explanation fidelity.
- Recent advances address causal reasoning, multiclass extensions, and global reduction strategies to improve robustness and interpretability of local explanations.
Local explanation techniques are a central class of methods in explainable artificial intelligence (XAI) designed to attribute predictions of complex models to individual input features or interpretable properties in the vicinity of a specific instance. Unlike global explanation methods, which seek to summarize model behavior in aggregate, local techniques provide a granular view of how a black-box model arrives at a particular output for a particular input. This article reviews the formal foundations, principal methodologies, advances in domain adaptation and evaluation, multiclass and high-level extensions, faithfulness analytics, and systematization trends in local explanation research, emphasizing precise technical definitions and the latest methodological innovations.
1. Mathematical Foundations and Surrogate Modeling
At the core of local explanation techniques is the idea of fitting a simple, interpretable surrogate model to match the behavior of a black-box predictor in a neighborhood of a target point . The canonical objective, instantiated by LIME and its derivatives, is: where:
- is an interpretable model family (e.g., sparse linear models, trees),
- are perturbations of ,
- may represent an interpretable feature mapping,
- is a proximity kernel (e.g., RBF),
- penalizes model complexity (often enforcing sparsity or limiting the number of active features) (Botari et al., 2020, Zhang et al., 2019).
Surrogates are most frequently optimized via weighted least squares (LIME, MeLIME), Shapley value regression (SHAP), or combinatorial search for rule-conjunctions (Anchors), with regularization mechanisms to favor interpretability.
In multiclass problems, methods such as LIPEx generalize the objective to fit the entire -dimensional probability vector using a single surrogate matrix , optimizing a loss based on the squared Hellinger distance between predicted and surrogate probabilities: where , and denotes the Hellinger distance, ensuring symmetric and bounded treatment of class probabilities (Zhu et al., 2023).
2. Perturbation Strategies and Data Manifold Awareness
Perturbation is critical for local explainer fidelity. Classic techniques sample in the original input space via masking or additive noise, but this can result in surrogate models trained on out-of-distribution or implausible points. To address this, several methods estimate or restrict sampling to the empirical data manifold:
- MeLIME and LINEX recommend KDE, PCA+KDE, VAE-based, or Word2Vec-based samplers to produce realistic perturbations (Botari et al., 2020, Dhurandhar et al., 2022).
- The α-shape approach estimates the support of the data distribution, constraining neighborhoods to meaningful regions and improving stability and faithfulness (reducing mean squared error by factors of $3$–$5$ over naive sampling) (Botari et al., 2019).
- N-ball uniform sampling (mLIME) samples directly within a fixed-radius ball, obviating the need for post-hoc kernel weighting and reducing RMSE compared to LIME by in benchmark studies (Shrestha et al., 5 Dec 2025).
- Modified sampling strategies, such as clique-based enumeration in MPS-LIME, generate perturbations that respect spatial superpixel adjacency in images, enhancing both explanation fidelity and interpretability (Shi et al., 2020).
The selection of perturbation strategy is crucial to explanation stability and is directly linked to the faithfulness and robustness observed in downstream evaluation.
3. Model Classes and Output Spaces: Beyond Scalar Attribution
Traditional local explainers (e.g., LIME, Kernel SHAP) target scalar outputs—either the predicted class probability or logit—which limits their applicability in multiclass and sequential or structured prediction settings. Recent advances address these limitations:
- LIPEx fits a matrix explaining the redistribution of probabilistic mass among all classes, with each column corresponding to a class-specific feature attribution. The softmax link assures surrogate predictions respect the simplex geometry and captures inter-class trade-offs. Experimental results demonstrate that LIPEx is approximately faster and more sample-efficient than fitting separate LIME surrogates; feature ablation tests show more rapid confidence decay than LIME/SHAP, evidencing stronger faithfulness (Zhu et al., 2023).
- For time-series and sequence data, extensions such as ReX augment perturbation and surrogate vocabularies to include positional and temporal predicates, sampling over variable-length and reordered input neighborhoods. Empirical results show increases in coverage of $150$– and improved user alignment (Liu et al., 2022).
- LXDR adapts the local surrogate paradigm to the output of arbitrary dimensionality reduction operators, providing per-reduced-dimension feature attributions and outperforming global surrogates in both surrogate fidelity and outlier identification (Bardos et al., 2022).
- In context-aware settings, CLE introduces interpretable combinations (e.g., feature conjunctions up to order ) into the surrogate feature space, reducing approximation error (ImageNet absolute error ) and improving recall and trustworthiness relative to LIME (Zhang et al., 2019).
- Concept-based local explanations (ConLUX) systematically project explanations from low-level features to high-level, human-interpretable concepts using large external models for concept extraction, yielding demonstrably more faithful and user-preferred explanations across standard XAI benchmarks (Liu et al., 16 Oct 2024).
4. Causal, Robustness, and Faithfulness Analytical Techniques
Recent research prioritizes causal interpretability and robust faithfulness assessment:
- LaPLACE utilizes Markov blanket induction to find the minimal, causally sufficient explanatory feature set for a given instance. Local accuracy (measured by weighted ) on held-out test data surpasses LIME and SHAP across all tested models (e.g., ALARM dataset: LIME $0.945$, SHAP $0.959$, LaPLACE $0.981$), while selection entropy is reduced (boosting explanation consistency) (Minn, 2023).
- LINEX, inspired by IRM, solves a multi-environment risk minimization game that empirically eliminates features with inconsistent local gradient sign and yields explanations that are sparser and more stable (coefficient inconsistency, unidirectionality, and class attribution consistency metrics outperforming LIME, MeLIME, and SHAP) (Dhurandhar et al., 2022).
- Trend-based faithfulness testing evaluates explainers on their ability to track known monotonic trends (e.g., backdoor trigger learning, data interpolation regimes) using real sequences or distributional shifts, overcoming the "random dominance" issue inherent in perturbation-based evaluation. Integrated Gradients and SmoothGrad variants consistently top these metrics, whereas standard black-box perturbation methods can underperform or even be outperformed by random baselines on complex data (CIFAR-10, Tiny ImageNet) (He et al., 2023).
- Objective benchmarks built on additive decompositions (e.g., log-odds for logistic regression or Naive Bayes) facilitate ranking explainers by Spearman's correlation or cosine similarity to true per-feature contributions; LIME and SHAP vary in their rank depending on metric and preprocessing regime (Rahnama et al., 2021).
5. Systematization, Summarization, and Globalization of Local Explanations
A key challenge is reconciling the proliferation and instability of per-instance explanations into coherent global narratives:
- ExplainReduce frames the reduction of large sets of local explanations to a compact "proxy set" as a submodular optimization, using greedy algorithms to achieve over of maximal coverage with proxies. These proxies serve both as stable surrogates for classifying and explaining new instances and as a basis for human-level global understanding (Seppäläinen et al., 14 Feb 2025).
- Topological skeletons (GALE) characterizing the function from explanation space to model predictions via persistence diagrams provide a robust, dimensionality-invariant signature for comparing explanation methods and tuning their hyperparameters (Xenopoulos et al., 2022).
- Interactive visual frameworks—such as animated linear projections ("radial tours" via cheem)—enable analysts to navigate local explanation directions in high-dimensional feature spaces and diagnose model failures, variable substitution effects, or outliers (Spyrison et al., 2022).
6. Adversarial Challenges and Limitations
Local explanation methods based on input perturbations can be actively deceived by adversarial scaffolding. By constructing classifiers that behave normally on in-distribution data but output arbitrarily benign responses on synthetic points, an attacker can preserve a model's bias while ensuring that LIME and SHAP explainers yield innocuous attributions (e.g., sensitive features eliminated from the top-3 contributions on $80$– of test points) (Slack et al., 2019). This exposes a fundamental vulnerability of local explainers that rely on out-of-distribution sampling, motivating research on distribution-aware and causally justified techniques. Experimental and theoretical analyses consistently emphasize the necessity of in-distribution sampling, stability, and careful faithfulness testing for any trustworthy deployment of local explainer methods.
References
- LIPEx: (Zhu et al., 2023)
- CLE: (Zhang et al., 2019)
- Domain-aware sampling: (Botari et al., 2019)
- Temporal/sequence extensions (ReX): (Liu et al., 2022)
- MPS-LIME: (Shi et al., 2020)
- Local nonlinearity (mLIME): (Shrestha et al., 5 Dec 2025)
- MeLIME: (Botari et al., 2020)
- Causal local explanation (LaPLACE): (Minn, 2023)
- Linex: (Dhurandhar et al., 2022)
- Concept-based local explanations (ConLUX): (Liu et al., 16 Oct 2024)
- Faithfulness/trend-based tests: (He et al., 2023)
- Global reduction (ExplainReduce): (Seppäläinen et al., 14 Feb 2025)
- Topological/comparative evaluation (GALE): (Xenopoulos et al., 2022)
- Aggregation and visualization: (Spyrison et al., 2022)
- Adversarial attacks: (Slack et al., 2019)
- White-box evaluation with aLOR: (Rahnama et al., 2021)
- LXDR for dimensionality reduction: (Bardos et al., 2022)