Local Interpretability in Machine Learning
- Local interpretability is the practice of providing instance-specific, human-understandable explanations that highlight key features driving individual model predictions.
- Techniques like LIME, SHAP, and model-specific attributions approximate local model behavior by creating simple surrogate models in a defined neighborhood.
- This approach is crucial for transparency, regulatory compliance, and domain validation in high-stakes applications such as healthcare, finance, and medical imaging.
Local interpretability refers to the ability to provide precise, human-understandable explanations for an individual prediction or a tightly circumscribed neighborhood of predictions made by a machine learning model. In contrast to global interpretability—which seeks to characterize a model's behavior over the entire input space—local interpretability focuses on identifying which features or combinations of features are responsible for a particular output, how small input changes would affect the prediction, and which minimal changes would lead to a different model outcome. This notion is critical for model transparency, regulatory compliance, domain validation, and individual-level trust, especially in high-stakes domains such as healthcare, finance, and medical imaging.
1. Formal Frameworks and Definitions
A local explanation typically takes the form of a simple, interpretable surrogate function that approximates the model in the neighborhood of a specific input . The canonical local interpretability formalism is:
where:
- denotes a class of interpretable functions (e.g., linear models, shallow trees, decision rules),
- is a local neighborhood around ,
- is a proximity kernel weighting by distance to ,
- penalizes explanation complexity
The explanation is "local" both in the sense of its validity radius and its instance specificity. Explanations may be feature attributions, rules, prototypes, or minimal sufficient sets, depending on the context and modality (Stiglic et al., 2020, Hu et al., 2020, Zhu et al., 2024, Kocbek et al., 2020).
Two formal notions ground local interpretability in the literature (Slack et al., 2019):
- Simulatability: The ability of a human to compute the model output for a given input manually.
- “What if” Local Explainability: The capacity to predict the model's response to small, local perturbations (counterfactual reasoning) given the model’s representation and a base prediction.
2. Representative Methods for Local Interpretability
A wide array of methodologies has been developed, which generally fall into model-agnostic or model-specific classes.
Model-Agnostic Surrogate Methods
- LIME (Local Interpretable Model-agnostic Explanations): Constructs a sparse linear surrogate in the vicinity of 0 by perturbing input features and weighting samples by proximity. The explanations correspond to influential surrogate coefficients (Stiglic et al., 2020, Aditya et al., 2022, Lopardo et al., 2023, Shankaranarayana et al., 2019).
- SHAP (Shapley Additive Explanations): Computes feature attributions as Shapley values, representing each feature's marginal contribution averaged over all feature subsets (Stiglic et al., 2020, Aditya et al., 2022, Pelegrina et al., 2022, Alam et al., 2023).
- Anchors, Rule-based Local Models (LoRMIkA): Produce if-then rules with high local fidelity in a small neighborhood, using search algorithms like OPUS to find optimal or high-interest rules (Rajapaksha et al., 2019).
Model-Specific Attribution Methods
- Integrated Gradients, DeepLIFT, Saliency Maps: Compute gradient-based or layer-wise attributions for continuous ML models, attributing model outputs to features via backpropagation (Stiglic et al., 2020, Alam et al., 2023, Lopardo et al., 2023).
- Decision Path Extraction in Trees: For tree models, the path traversed by 1 is itself a minimal sufficient explanation (Slack et al., 2019, Bardos et al., 2023).
Manifold- and Data-aware Perturbation
- VAE-LIME, ALIME: Replace naïve random sampling in LIME with realistic perturbations generated by (variational) autoencoders, improving explanation fidelity and stability (Schockaert et al., 2020, Shankaranarayana et al., 2019).
- TSInsight: Combines autoencoder bottlenecks with classifier gradients to generate sparse, instance-based attributions for time-series (Siddiqui et al., 2020).
Instance-based, Prototype, and Mixture Models
- Proto-BagNet: Yields local, part-based explanations by matching local patches to class prototypes, resulting in directly interpretable, spatially localized support for decisions (Djoumessi et al., 2024).
- Implicit Mixture of Interpretable Experts (IMoIE): Routes each input to a linear “expert,” such that the expert’s coefficients provide the local explanation; scalability is achieved through implicit parameterization (Elazar et al., 2022).
Complexity-theoretic Approaches
- For models such as linear functions and decision trees, finding minimal sufficient local explanations can range from PTIME to NP/Σ₂P-complete, with the hardest problems arising in neural networks (Bassan et al., 2024). The presence of exponentially many local minimal sufficient reasons makes exact enumeration intractable in general.
3. Algorithmic and Theoretical Properties
Local interpretability methods are grounded in various axioms and mathematical guarantees:
- Efficiency (attribution sums match output difference): SHAP-based methods (Pelegrina et al., 2022, Aditya et al., 2022).
- Sensitivity, Implementation Invariance: Integrated Gradients, Local Attribution (LA) (Stiglic et al., 2020, Zhu et al., 2024).
- In-distribution Sampling: Methods like LA and VAE-LIME enforce that perturbed samples stay within the input distribution, avoiding OOD artifacts (Zhu et al., 2024, Schockaert et al., 2020).
- Faithfulness: Measured through metrics such as insertion/deletion AUC, comprehensiveness (output drop upon feature removal), or robustness (stability under repeated runs) (Lopardo et al., 2023, Zhu et al., 2024).
Key complexity results show that for linear models, checking and finding minimal local sufficient subsets is polynomially tractable, whereas for decision trees it is NP-complete, and for neural networks it is coNP- or Σ₂P-complete (Bassan et al., 2024).
4. Evaluation Metrics and Empirical Findings
Quantitative evaluation of local interpretability exploits several metrics:
- Insertion/Deletion AUC: Measures how the model's confidence responds to progressive addition or removal of important features (Zhu et al., 2024).
- Comprehensiveness and Sufficiency: Quantify how much removing the explanation reduces prediction confidence, and how well preserved the output is given only the explanation features (Lopardo et al., 2023, Luo et al., 2021).
- Faithfulness to the black-box: Local 2, RMSE, and hit-rate in ablation studies (Aditya et al., 2022, Hu et al., 2020, Kocbek et al., 2020).
- Robustness/Stability: Consistency of explanations across runs or under small input perturbations (Lopardo et al., 2023, Shankaranarayana et al., 2019).
Empirical studies on image (ImageNet), textual, medical, tabular, and time-series datasets demonstrate that methods such as LA outperform prior approaches by up to 38% on insertion metrics (Zhu et al., 2024), while generative perturbation methods (VAE-LIME, ALIME) demonstrably improve fidelity and stability over the original LIME (Schockaert et al., 2020, Shankaranarayana et al., 2019). In clinical domains, local explanations enable actionable insight at the individual patient level and may alter feature rankings after model calibration, highlighting the need for explicit inspection of calibration effects (Stiglic et al., 2020, Kocbek et al., 2020).
5. Limitations, Open Challenges, and Future Directions
Several limitations and challenges are common to local interpretability:
- Approximation vs. Faithfulness: Surrogates may not capture the true model decision process if sample perturbations leave the data manifold or if the locality kernel is poorly chosen (Schockaert et al., 2020).
- Computational Cost: Naïve implementations of SHAP and LIME are expensive in high dimension; advances such as Choquet-integral reduction and tree-based surrogates offer scalable alternatives (Pelegrina et al., 2022, Aditya et al., 2022).
- Stability and Robustness: Explanations can be sensitive to sampling or to minor input variations, especially for sampling-based surrogates (Shankaranarayana et al., 2019, Lopardo et al., 2023).
- Human Simulatability: Usability studies indicate that interpretability (measured as human accuracy and task time) sharply declines with model complexity, especially for neural networks with more than ∼100 operations (Slack et al., 2019).
- Model Calibration: The impact of calibration on explanations can shift feature importances or remove key features from the local top contributors—for domains demanding high interpretability, explanations should be audited post-calibration (Kocbek et al., 2020).
- Modality Specificity: Local explanation techniques differ in their performance and faithfulness in vision, text, tabular, and time-series data—and new combinations (e.g., autoencoder-based or prototype-based models) continue to close gaps.
Open directions include integrating causal reasoning for actionable explanations, scaling local interpretability to complex models via efficient data-driven manifold representations, rigorously defining and benchmarking faithfulness, and developing models inherently designed for local explanation (e.g., CALMs, Proto-BagNets) that do not require post-hoc surrogates (Djoumessi et al., 2024, Gkolemis et al., 18 Feb 2026).
6. Applications and Domain Adaptations
Local interpretability techniques have been successfully adapted and extended across modalities and sectors:
| Domain | Representative Approaches | Comment |
|---|---|---|
| Tabular | LIME, SHAP, LoRMIkA, SLIM | Rule-based, tree and additive surrogates |
| Imaging | Local Attribution, Grad-CAM, Proto-BagNet | Patch/prototype-based, attribution maps |
| Text | FRED, LIME, SHAP, Feature Attribution | Minimal word-sets, counterfactuals |
| Healthcare | Path-based, local explanations | Patient-specific actionable attributions |
| Time-series | TSInsight, VAE-LIME | Instance-based autoencoder attributions |
These methods are increasingly evaluated not only for faithfulness or technical fidelity but also in terms of domain significance—such as clinical plausibility of highlighted regions (e.g., in medical imaging (Alam et al., 2023)), or alignment with expert intuition (e.g., variable ranking in blast-furnace operation (Schockaert et al., 2020)).
7. Interpretability-by-Design and the Local–Global Spectrum
Recent research emphasizes moving beyond post-hoc explanations towards models designed for interpretability. These include:
- Conditionally Additive Local Models (CALMs): Partitioning the space and fitting region-specific shape functions for each feature, yielding locally additive yet interaction-aware explanations (Gkolemis et al., 18 Feb 2026).
- Implicit Mixtures of Interpretable Experts (IMoIE): Routing each input to a scalable collection of local linear (transparent) experts (Elazar et al., 2022).
- Proto-BagNet: Using architectural constraints to guarantee that each decision can be decomposed into a bag of prototype-based, spatially localized evidence (Djoumessi et al., 2024).
Furthermore, interpretability exists on a spectrum: as the number of experts or prototypes increases, global interpretability (the capacity to inspect all possibilities) gives way to local interpretability, with the latter remaining robust even for models with combinatorially many local rules (Colin et al., 2024, Elazar et al., 2022).
Local interpretability is thus a foundational, mathematically grounded, and empirically validated concept in explainable AI. It is instantiated via diverse methodologies—perturbation-based surrogates, gradient attributions, rule induction, prototype matching, and inherently interpretable model architectures—each with trade-offs in fidelity, computational burden, stability, and human comprehensibility. The ongoing evolution of the field is toward models and algorithms that provide faithful, instance-specific insight at scale, satisfying rigorous technical, practical, and domain-specific criteria.