Papers
Topics
Authors
Recent
Search
2000 character limit reached

Unified Formulation of Interpretable Inference

Updated 26 January 2026
  • The paper synthesizes multiple interpretability frameworks by formalizing diverse methods into a unified mathematical structure.
  • It details canonical approaches including SIPA pipelines, additive feature attribution, Bayesian decision theory, and latent-variable models to ensure explanation stability and fidelity.
  • It enables principled comparisons across methods while addressing computational challenges, extrapolation risks, and the integration of diverse interpretability techniques.

A unified formulation of interpretable inference seeks to subsume diverse interpretability methodologies within a precise, formal framework—allowing systematic comparison, principled design, and guarantees on stability, fidelity, and validity. Recent advances provide concrete mathematical structures for achieving this unification, spanning model-agnostic explanation pipelines, decision-theoretic Bayesian surrogates, additive feature attribution, probabilistic latent-variable approaches, inferential models with non-additive beliefs, symmetry-driven axioms, and frameworks for property-based scientific enquiry. This article surveys and synthesizes these main lines of research.

1. Model-Agnostic Unified Pipelines: The SIPA Framework

The SIPA (Sampling, Intervention, Prediction, Aggregation) framework provides a universal abstraction for model-agnostic interpretability techniques. In this paradigm, any post-hoc interpretation procedure is decomposed into four canonical stages (Scholbeck et al., 2019):

  1. Sampling: Draw K “perturbation indicators” z(k)z^{(k)} from a distribution S(zx,S)\mathcal{S}(z\,|\,x,S), each encoding which features to intervene upon and how.
  2. Intervention: Apply an operator I(x,z,S)I(x,z,S) that perturbs features in SS according to zz.
  3. Prediction: Evaluate the black-box model at each intervened input, yielding y^(k)=f(x~(k))\hat y^{(k)} = f(\tilde x^{(k)}).
  4. Aggregation: Use an operator AA to combine the tuples (z(k),x~(k),y^(k),w(k))(z^{(k)}, \tilde x^{(k)}, \hat y^{(k)}, w^{(k)}) into a final interpretation, such as attributions or importance.

Many canonical methods fall under SIPA by instantiating each stage differently:

Method Sampling Intervention Aggregation
LIME Binary local perturbation Reference-value imputation Weighted least squares surrogate
Shapley (SHAP) Feature orderings Set-to-reference Combinatorial weighted averaging
Permutation FI Permuted feature values Shuffle column Loss difference
Variance-based FI Grid over feature Set single feature value Empirical variance

Extrapolation risks arise from marginal-based sampling; combinatorial cost (notably in Shapley-type methods, O(p2p)\mathcal{O}(p2^p)) is a computational limitation (Scholbeck et al., 2019).

2. Additive Feature Attribution and Axiomatic Unification

A major alternative unification is the additive feature attribution framework, which postulates that faithful explanations must be locally additive in a binary interpretable space (Lundberg et al., 2016). For an interpretable vector x{0,1}Mx' \in \{0,1\}^M, any explanation is modeled as

g(x)=ϕ0+i=1Mϕixig(x') = \phi_0 + \sum_{i=1}^M \phi_i x'_i

The uniquely justified set of attributions ϕ\phi satisfying local accuracy, missingness, and consistency are the Shapley values:

ϕi=SN{i}S!(MS1)!M![fx(S{i})fx(S)]\phi_i = \sum_{S \subseteq N\setminus \{i\}} \frac{|S|!(M - |S| - 1)!}{M!} [f_x(S \cup \{i\}) - f_x(S)]

Here fx(S)f_x(S) represents the conditional expectation of ff given the presence of features in SS.

This result connects LIME, DeepLIFT, LRP, and other explanation methods as either exact or approximate Shapley value estimators under particular kernels and reference choices. KernelSHAP’s regression formulation, for example, is provably equivalent when using the Shapley kernel (Lundberg et al., 2016).

3. Decision-Theoretic and Bayesian Unification

A unified decision-theoretic Bayesian approach defines interpretability as a utility function balancing fidelity and simplicity (Afrabandpey et al., 2019). Given a reference predictive distribution pref(yx,D)p_\text{ref}(y\,|\,x,\mathcal{D}), an interpretable model ψ\psi is chosen to maximize:

U(ψ)=Exπ[Eypref(x,D)[logpψ(yx)]]λI(ψ)U(\psi) = \mathbb E_{x \sim \pi}\left[\mathbb E_{y \sim p_\text{ref}(\cdot\,|\,x,\mathcal{D})} [\log p_\psi(y\,|\,x)]\right] - \lambda I(\psi)

where I(ψ)I(\psi) quantifies interpretability (tree size, sparsity, etc.), and π\pi assigns importance to regions of input space (global or local). The method is model-agnostic and admits arbitrary reference and proxy families. Maximizing U(ψ)U(\psi) is equivalent to minimizing expected KL divergence plus an interpretability penalty.

Stability is quantified by measuring the mean dissimilarity between interpretable proxies across bootstrapped datasets (Briand–Emonet distance). Experimental results confirm that utility-based surrogates dominate prior-restricted proxies in fidelity-for-complexity tradeoff and stability (Afrabandpey et al., 2019).

4. Unified Probabilistic and Latent-Variable Models

The LEX (Latent EXplanation) framework casts interpretable inference as a latent variable model: zpθ(zx)z \sim p_\theta(z|x), x~p(x~x,z)x̃ \sim p(x̃|x, z), ypϕ(yx~)y \sim p_\phi(y|x̃), where zz is a mask encoding feature relevance (Senetaire et al., 2022). The joint and marginal likelihoods allow learning interpretable selectors and imputers by maximum likelihood (or regularized variants), unifying L2X, INVASE, REAL-X, rationale selection, LIME, SHAP, and occlusion within a single probabilistic structure. Inference and explanation are amortized; multiple imputation strategies yield robust, less artifact-prone masks.

Popular instance-wise feature selection methods emerge as specific choices of regularization and imputation. Multiple imputation (e.g., via VAEAC or mixtures) is essential for high true positive rate (TPR) and low false discovery rate (FDR) when ground-truth masks exist (Senetaire et al., 2022).

5. Symmetry-Based and Bayesian-Inversion Principles

An emerging perspective defines interpretable inference in terms of four symmetries: inference-equivariance, information-invariance, concept-closure invariance, and structural invariance (Barbiero et al., 19 Jan 2026). These symmetries, formalized as group actions or functorial constraints, enforce that explanations are simulatable, compressive, semantically aligned, and user-structural.

Every interpretable model is forced to factor as XZCYX \to Z \to C \to Y, where ZZ compresses to task-relevant information, CC expresses human concepts, and all Bayes-inversion queries (alignment, intervention, counterfactual) are special cases of

p(zx)p(xz)p(z)p(z|x) \propto p(x|z)p(z)

subject to the imposed symmetries.

This framework provides a categorical characterization of interpretability, subsuming concept-bottleneck models and making actionable interpretability a property of the model's symmetry structure rather than a set of informal desiderata (Barbiero et al., 19 Jan 2026).

6. Scientific Inference With Interpretable Machine Learning

A property-descriptor framework formalizes the process of interpreting machine learning models to draw conclusions about the scientific data-generating process (Freiesleben et al., 2022). The property descriptor gK:MQg_K: M \to \mathcal Q maps a model mm to a quantity QQ (e.g., partial dependence curve, feature importance) such that:

  • Identification: gK(m)=Qg_K(m^*) = Q, recovering the true scientific property from the optimal predictor.
  • Continuity: If mmm \approx m^* in risk, gK(m)Qg_K(m) \approx Q.
  • Estimability: There exists an unbiased estimator g^D\hat g_\mathcal{D^*} for gK(m)g_K(m).
  • Uncertainty Quantification: Both model error and estimation error are quantifiable in a bias–variance decomposition.

PDPs, PFI, Shapley values for risk, and local ICE/SHAP explanations are all expressible as property descriptors within this formalism, ensuring that post-hoc model interpretation is grounded in principles of statistical validity (Freiesleben et al., 2022).

7. Broader Implications and Limitations

By reducing diverse interpretability techniques to unified mathematical forms—be it SIPA pipelines, additive-attribution axioms, decision-theoretic Bayesian projections, probabilistic latent-variable models, or symmetry-based categories—the field achieves the following:

  • Method integration: Methods such as LIME, SHAP, LEX, and post-hoc surrogates become specific instantiations within general frameworks.
  • Comparison and extension: One can analyze fidelity, stability, error control, and computational cost in a principled way.
  • Limitations: Extrapolation arises in marginal-based samplings; combinatorial methods may be computationally intractable without approximations; additive models may not encompass every class of explanations; and not all desiderata are always mutually realizable (per symmetry constraints).

At its core, a unified formulation enables technically rigorous, cross-method comparison, and principled design for interpretable inference in complex models (Scholbeck et al., 2019, Lundberg et al., 2016, Afrabandpey et al., 2019, Senetaire et al., 2022, Freiesleben et al., 2022, Barbiero et al., 19 Jan 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Unified Formulation of Interpretable Inference.