Unified Formulation of Interpretable Inference
- The paper synthesizes multiple interpretability frameworks by formalizing diverse methods into a unified mathematical structure.
- It details canonical approaches including SIPA pipelines, additive feature attribution, Bayesian decision theory, and latent-variable models to ensure explanation stability and fidelity.
- It enables principled comparisons across methods while addressing computational challenges, extrapolation risks, and the integration of diverse interpretability techniques.
A unified formulation of interpretable inference seeks to subsume diverse interpretability methodologies within a precise, formal framework—allowing systematic comparison, principled design, and guarantees on stability, fidelity, and validity. Recent advances provide concrete mathematical structures for achieving this unification, spanning model-agnostic explanation pipelines, decision-theoretic Bayesian surrogates, additive feature attribution, probabilistic latent-variable approaches, inferential models with non-additive beliefs, symmetry-driven axioms, and frameworks for property-based scientific enquiry. This article surveys and synthesizes these main lines of research.
1. Model-Agnostic Unified Pipelines: The SIPA Framework
The SIPA (Sampling, Intervention, Prediction, Aggregation) framework provides a universal abstraction for model-agnostic interpretability techniques. In this paradigm, any post-hoc interpretation procedure is decomposed into four canonical stages (Scholbeck et al., 2019):
- Sampling: Draw K “perturbation indicators” from a distribution , each encoding which features to intervene upon and how.
- Intervention: Apply an operator that perturbs features in according to .
- Prediction: Evaluate the black-box model at each intervened input, yielding .
- Aggregation: Use an operator to combine the tuples into a final interpretation, such as attributions or importance.
Many canonical methods fall under SIPA by instantiating each stage differently:
| Method | Sampling | Intervention | Aggregation |
|---|---|---|---|
| LIME | Binary local perturbation | Reference-value imputation | Weighted least squares surrogate |
| Shapley (SHAP) | Feature orderings | Set-to-reference | Combinatorial weighted averaging |
| Permutation FI | Permuted feature values | Shuffle column | Loss difference |
| Variance-based FI | Grid over feature | Set single feature value | Empirical variance |
Extrapolation risks arise from marginal-based sampling; combinatorial cost (notably in Shapley-type methods, ) is a computational limitation (Scholbeck et al., 2019).
2. Additive Feature Attribution and Axiomatic Unification
A major alternative unification is the additive feature attribution framework, which postulates that faithful explanations must be locally additive in a binary interpretable space (Lundberg et al., 2016). For an interpretable vector , any explanation is modeled as
The uniquely justified set of attributions satisfying local accuracy, missingness, and consistency are the Shapley values:
Here represents the conditional expectation of given the presence of features in .
This result connects LIME, DeepLIFT, LRP, and other explanation methods as either exact or approximate Shapley value estimators under particular kernels and reference choices. KernelSHAP’s regression formulation, for example, is provably equivalent when using the Shapley kernel (Lundberg et al., 2016).
3. Decision-Theoretic and Bayesian Unification
A unified decision-theoretic Bayesian approach defines interpretability as a utility function balancing fidelity and simplicity (Afrabandpey et al., 2019). Given a reference predictive distribution , an interpretable model is chosen to maximize:
where quantifies interpretability (tree size, sparsity, etc.), and assigns importance to regions of input space (global or local). The method is model-agnostic and admits arbitrary reference and proxy families. Maximizing is equivalent to minimizing expected KL divergence plus an interpretability penalty.
Stability is quantified by measuring the mean dissimilarity between interpretable proxies across bootstrapped datasets (Briand–Emonet distance). Experimental results confirm that utility-based surrogates dominate prior-restricted proxies in fidelity-for-complexity tradeoff and stability (Afrabandpey et al., 2019).
4. Unified Probabilistic and Latent-Variable Models
The LEX (Latent EXplanation) framework casts interpretable inference as a latent variable model: , , , where is a mask encoding feature relevance (Senetaire et al., 2022). The joint and marginal likelihoods allow learning interpretable selectors and imputers by maximum likelihood (or regularized variants), unifying L2X, INVASE, REAL-X, rationale selection, LIME, SHAP, and occlusion within a single probabilistic structure. Inference and explanation are amortized; multiple imputation strategies yield robust, less artifact-prone masks.
Popular instance-wise feature selection methods emerge as specific choices of regularization and imputation. Multiple imputation (e.g., via VAEAC or mixtures) is essential for high true positive rate (TPR) and low false discovery rate (FDR) when ground-truth masks exist (Senetaire et al., 2022).
5. Symmetry-Based and Bayesian-Inversion Principles
An emerging perspective defines interpretable inference in terms of four symmetries: inference-equivariance, information-invariance, concept-closure invariance, and structural invariance (Barbiero et al., 19 Jan 2026). These symmetries, formalized as group actions or functorial constraints, enforce that explanations are simulatable, compressive, semantically aligned, and user-structural.
Every interpretable model is forced to factor as , where compresses to task-relevant information, expresses human concepts, and all Bayes-inversion queries (alignment, intervention, counterfactual) are special cases of
subject to the imposed symmetries.
This framework provides a categorical characterization of interpretability, subsuming concept-bottleneck models and making actionable interpretability a property of the model's symmetry structure rather than a set of informal desiderata (Barbiero et al., 19 Jan 2026).
6. Scientific Inference With Interpretable Machine Learning
A property-descriptor framework formalizes the process of interpreting machine learning models to draw conclusions about the scientific data-generating process (Freiesleben et al., 2022). The property descriptor maps a model to a quantity (e.g., partial dependence curve, feature importance) such that:
- Identification: , recovering the true scientific property from the optimal predictor.
- Continuity: If in risk, .
- Estimability: There exists an unbiased estimator for .
- Uncertainty Quantification: Both model error and estimation error are quantifiable in a bias–variance decomposition.
PDPs, PFI, Shapley values for risk, and local ICE/SHAP explanations are all expressible as property descriptors within this formalism, ensuring that post-hoc model interpretation is grounded in principles of statistical validity (Freiesleben et al., 2022).
7. Broader Implications and Limitations
By reducing diverse interpretability techniques to unified mathematical forms—be it SIPA pipelines, additive-attribution axioms, decision-theoretic Bayesian projections, probabilistic latent-variable models, or symmetry-based categories—the field achieves the following:
- Method integration: Methods such as LIME, SHAP, LEX, and post-hoc surrogates become specific instantiations within general frameworks.
- Comparison and extension: One can analyze fidelity, stability, error control, and computational cost in a principled way.
- Limitations: Extrapolation arises in marginal-based samplings; combinatorial methods may be computationally intractable without approximations; additive models may not encompass every class of explanations; and not all desiderata are always mutually realizable (per symmetry constraints).
At its core, a unified formulation enables technically rigorous, cross-method comparison, and principled design for interpretable inference in complex models (Scholbeck et al., 2019, Lundberg et al., 2016, Afrabandpey et al., 2019, Senetaire et al., 2022, Freiesleben et al., 2022, Barbiero et al., 19 Jan 2026).