Papers
Topics
Authors
Recent
Search
2000 character limit reached

SMILE: Statistical Model-Agnostic Local Explanations

Updated 8 February 2026
  • The paper introduces a novel framework, SMILE, that leverages statistical distances like Wasserstein and ECDF-based measures to provide robust local explanations.
  • It quantifies the influence of input components using controlled perturbations and weighted linear surrogates, enhancing fidelity and stability compared to traditional methods.
  • The framework extends to multiple domains including tabular, image, text, and 3D data (gSMILE for generative AI), demonstrating superior performance under adversarial or distributional shifts.

Statistical Model-agnostic Interpretability with Local Explanations (SMILE) is a post-hoc, model-agnostic interpretability framework designed to provide rigorous, stable, and faithful local explanations for black-box machine learning models across diverse domains. By leveraging statistical distance metrics—principally optimal transport (Wasserstein) measures and empirical cumulative distribution function (ECDF)-based distances—SMILE systematically quantifies the influence of input components on a model’s output. The framework is generalizable to tabular, image, text, point cloud, and generative models, and has been extended as gSMILE for modern generative AI. Its crucial innovation is the replacement of ad-hoc similarity kernels used in LIME with statistical distances, yielding greater alignment with human intuition, high stability, and empirical robustness, especially under adversarial or distributional shift scenarios (Aslansefat et al., 2023, Dehghani et al., 27 May 2025, Ahmadi et al., 2024, Dehghani et al., 2024, Dehghani, 1 Feb 2026, Moghaddam et al., 3 Sep 2025).

1. Core Principles and Mathematical Formulation

SMILE operates in the model-agnostic, local explanation paradigm: Given a black-box model f()f(\cdot) and an input xx, the aim is to estimate the contribution of each component (feature, pixel, token, etc.) to f(x)f(x)'s output by constructing a local surrogate model. The methodology consists of several steps:

  1. Controlled Perturbation: Generate JJ perturbed versions {x^j}j=1J\{ \hat{x}_j \}_{j=1}^J of xx by masking or removing input components (e.g., features for tabular, super-pixels for images, tokens for text).
  2. Output and Input Distance Calculation: Compute, for each perturbation jj:
    • Output distance Δ(x,x^j)=W(π(yx),π(yx^j))\Delta(x, \hat{x}_j) = W(\pi(y|x), \pi(y|\hat{x}_j)), with WW typically the Wasserstein distance over output distributions.
    • Input distance δj=IWMD(x,x^j)\delta_j = \mathrm{IWMD}(x, \hat{x}_j), e.g., Word Mover’s Distance for text, Anderson–Darling/ECDF-based for structured or geometric data.
  3. Sample Weighting: Assign a kernel weight wj=exp[(δj/σ)2]w_j = \exp[-(\delta_j/\sigma)^2] that prioritizes perturbations semantically/geometrically close to xx.
  4. Surrogate Fitting: Encode each perturbation as a binary vector zj{0,1}dz_j \in \{0,1\}^d and fit a weighted linear surrogate hθ(zj)=θ0+θzjh_\theta(z_j) = \theta_0 + \theta^\top z_j by minimizing:

L(θ)=1Jj=1Jwj[hθ(zj)Δ(x,x^j)]2L(\theta) = \frac{1}{J} \sum_{j=1}^J w_j [ h_\theta(z_j) - \Delta(x, \hat{x}_j) ]^2

The coefficients θ\theta yield feature attributions local to xx.

Extensions such as gSMILE for generative models redefine output distances using semantic metrics (e.g., OWMD for text, Wasserstein distance on DINOv2 embeddings for images) and retain the same surrogate structure (Dehghani, 1 Feb 2026).

2. Application Domains and Extensions

2.1 Tabular and Image Data

Originally, SMILE was introduced to address limitations in LIME for tabular and image models by replacing pointwise distance metrics with ECDF-based or Wasserstein distances (e.g., 1-Wasserstein, Kolmogorov–Smirnov, Anderson–Darling) for improved stability and fidelity (Aslansefat et al., 2023). For images, perturbations typically take the form of super-pixel masking, with statistical distances computed over the resulting pixel/region distributions.

2.2 LLMs

For LLMs, SMILE formalizes token-level attributions. Inputs are tokenized, perturbations generated by masking tokens, and output changes quantified using Wasserstein distances between output probability distributions or semantic embedding distributions. Inputs are weighted by WMD, and the weighted surrogate regression yields interpretable token attributions, visualized via heatmaps (Dehghani et al., 27 May 2025, Dehghani, 1 Feb 2026).

2.3 Generative Models (gSMILE)

gSMILE generalizes SMILE for generative AI:

  • Text generation: Input prompt is perturbed at the token level; the model's sequence outputs are compared using OWMD.
  • Instruction-based image editing: Textual instructions are perturbed; output images are embedded using DINOv2; Wasserstein distances measure the semantic shift in outputs (Dehghani et al., 2024, Dehghani, 1 Feb 2026).
  • Knowledge Graph RAG: For knowledge graph–augmented generative models, controlled graph perturbations (removal of triples) allow attribution of output changes to specific graph components using the same statistical and surrogate principles (Moghaddam et al., 3 Sep 2025).

2.4 Point Cloud Data

For 3D point-cloud neural nets, SMILE clusters point clouds into super-points, generates binary perturbations over clusters, and employs ECDF distances (notably Anderson–Darling) to compare local geometric distributions. This approach captures geometric salience and yields explanations robust to kernel and perturbation choices (Ahmadi et al., 2024).

3. Statistical Distance Measures and Surrogate Modeling

SMILE advances the surrogate modeling literature by using statistical distances between empirical distributions constructed via secondary perturbation clouds. Supported metrics include:

  • Wasserstein (Earth Mover’s) Distance: Captures transportation cost between empirical CDFs.
  • Kolmogorov–Smirnov, Anderson–Darling, Cramér–von Mises: ECDF-based tail- and variance-sensitive measures.
  • Maximum Mean Discrepancy, Kullback–Leibler Divergence: For kernel and information-theoretic assessment. Selection of metric is domain-dependent; Anderson–Darling is noted for robustness in point cloud applications (Ahmadi et al., 2024); OWMD for generative text (Dehghani, 1 Feb 2026). For weighting, SMILE always applies an exponential decay kernel using the chosen input distance.

Weighted linear regression remains the primary surrogate in all studied domains; non-linear surrogates (e.g. kernel or Bayesian surrogates) are highlighted as emerging directions.

4. Evaluation Metrics and Empirical Analysis

SMILE/gSMILE employs rigorous quantitative and qualitative evaluation metrics to assess explanation quality, all aligned with the formal metrics below (Dehghani, 1 Feb 2026):

Metric Formal Definition Purpose
Accuracy (AttAUC, AttF1) AUC(θ, ground-truth labels) Alignment with annotated true attributions
Stability Jaccard(S, S′) = SS/SS|S∩S′|/|S∪S′| Robustness to minor input perturbations
Consistency Var(θi)\operatorname{Var}(\theta_i) Variation across runs (random sampling)
Fidelity Rw2R^2_w, WMSE\mathrm{WMSE} Surrogate’s predictive fit to Δj\Delta_j
Faithfulness Correlation(Δ(G(x),G(xi))\Delta(G(x), G(x_{-i})), θi\theta_i) Causal monotonicity between attribution and effect

Empirical results demonstrate superior performance over LIME, SHAP, and BayLIME across vision, language, and generative settings, with AttAUROC of 0.84–0.88 for LLMs (vs. 0.65 for LIME) and up to 1.0 for generative image editors (Dehghani, 1 Feb 2026, Dehghani et al., 2024). Stability (Jaccard index ≈ 0.85) and fidelity (Rw2>0.70R^2_w>0.70) are consistently high.

SMILE’s explanations are visualized as token-level or pixel-level heatmaps, providing interpretable, instance-local attributions:

  • For text: Each token ii colored with RGBA proportional to θi|\theta_i|.
  • For images: Instructions colored as bars or highlighted overlays, with α\alpha-blending proportional to coefficient magnitude.

5. Scenario-based and ODD Evaluation

gSMILE incorporates scenario-driven testing using the Operational Design Domain (ODD) framework. This approach stratifies evaluation by environmental (e.g., lighting, weather) and semantic (e.g., object presence) axes, systematically sampling input-perturbation scenarios and aggregating attribution metrics over ODD cells (Dehghani, 1 Feb 2026). Heatmaps and tabular summaries provide a comprehensive view of model behavior under real-world variation.

6. Strengths, Limitations, and Future Directions

Strengths:

  • Statistically principled, model-agnostic framework applicable across domains and modalities.
  • Surrogates are robust under different kernel and perturbation choices, as empirically demonstrated for point clouds and generative models (Ahmadi et al., 2024, Dehghani, 1 Feb 2026).
  • High explainability fidelity and stability, resulting in visualizations closely aligned with ground-truth semantics.
  • Faithful local explanations aid high-stakes deployment (e.g., healthcare, autonomous driving, biomedical KG-QA).

Limitations:

  • Linear surrogate assumption may not capture strong nonlinearity near certain inputs or in high-dimensional spaces.
  • Perturbation strategies (particularly in text and generative domains) may produce large semantic shifts, challenging local linearity assumptions.
  • High computational cost associated with Wasserstein/ECDF metrics; mitigation via Sinkhorn regularization and sampling efficiency is a direction for optimization (Aslansefat et al., 2023).

Prospective Directions:

  • Nonlinear and kernel-based surrogate models.
  • Extension to global explainability (e.g., via activation maximization, network-level surrogates).
  • Enhanced metrics for faithfulness and causal verification.
  • Data-driven scenario construction for OOD robustness.
  • Integration of graph-based selective inference (cf. GLIME) to certify conditional relationships in attributions (Dikopoulou et al., 2021).

7. Representative Algorithms and Quantitative Findings

Below is a pseudocode summary of gSMILE, the generative SMILE extension (Dehghani, 1 Feb 2026):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Input: original prompt x, black-box generator G, J perturbations, kernel width σ
Output: attribution scores θ for each token in x

1. Tokenize x into d tokens, construct binary mask vector z of length d
2. For j in 1..J:
     a. Randomly zero out a subset of tokens in x to form x̂_j
     b. Call G(x̂_j)  output y_j (text sequence or image)
     c. Compute embeddings e_j of y_j (WMD or DINOv2)
     d. Compute Δ_j = Wasserstein( e , e_j ), where e = embed(G(x))
     e. Compute δ_j = IWMD(x, x̂_j)
     f. Compute w_j = exp[ (δ_j/σ)² ]
     g. Let z_j  {0,1}^d be mask indicating which tokens remain in x̂_j
3. Fit θ by weighted least squares: minimize_θ  _{j=1}^J w_j·[ θᵀ z_j  Δ_j ]²
4. Return θ_i (attribution of token i)

Quantitative findings:

Model ATT-ACC ATT-F1 ATT-AUROC Stability R²_w WMSE
GPT-3.5-turbo 0.70 0.59 0.84 0.62 0.71 0.039
LLaMA-3.1 0.76 0.40 0.84 0.45 0.71 0.037
Claude-3.5 0.82 0.67 0.88 0.44 0.70 0.027
Instruct-Pix2Pix 0.96 0.92 1.00 0.85 0.72 0.012
Img2Img-Turbo 0.83 0.63 1.00 0.85 0.62 0.019

These results confirm SMILE and gSMILE's advantages in both attribution accuracy and robustness over previous LIME-based methods (Dehghani et al., 27 May 2025, Dehghani et al., 2024, Dehghani, 1 Feb 2026).


SMILE and its generative extension gSMILE represent a statistically grounded, theoretically motivated, and empirically validated approach to black-box model explanation, integrating optimal-transport metrics and local surrogate modeling for high-fidelity, robust attribution across a range of modern AI architectures.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Statistical Model-agnostic Interpretability with Local Explanations (SMILE).