SPINRec: Stochastic Path Integration Explanations

Updated 29 November 2025

The paper introduces SPINRec, a model-agnostic framework that uses stochastic baseline sampling and path integration to generate more faithful explanations than fixed baseline methods.
The approach is rigorously evaluated on MF, VAE, and NCF models across datasets like MovieLens-1M, Yahoo! Music, and Pinterest, showing significant improvements on counterfactual metrics such as DEL@K and POS@K.
SPINRec offers scalable computation through parallel stochastic paths, paving the way for enhanced interpretability of recommender systems and potential extensions to multi-modal and sequential models.

SPINRec (Stochastic Path Integration for Neural Recommender Explanations) is a model-agnostic framework designed to generate fidelity-aware explanations for neural recommender systems operating on sparse, implicit feedback data. Unlike classical attribution methods, which often rely on fixed or unrealistic baselines, SPINRec utilizes stochastic baseline sampling and path integration to maximize the faithfulness of feature relevance scores with respect to actual model reasoning, as assessed by counterfactual metrics. The approach is evaluated extensively across matrix factorization (MF), variational autoencoder (VAE), and neural collaborative filtering (NCF) models using MovieLens-1M, Yahoo! Music, and Pinterest datasets, and establishes new benchmarks for explanation fidelity (Barkan et al., 22 Nov 2025).

1. Formal Problem Statement and Key Notation

Let $\mathcal{U}$ denote the set of users and $\mathcal{V}$ the set of items. Each user $u \in \mathcal{U}$ is associated with a binary interaction vector $x \in \{0,1\}^{|\mathcal{V}|}$ , recording whether $u$ has interacted with item $i$ $(x_i = 1)$ or not $(x_i = 0)$ . A trained recommender $f_\theta: \{0,1\}^{|\mathcal{V}|} \to [0,1]^{|\mathcal{V}|}$ outputs affinity scores $f^y(x)$ for a target item $y$ conditioned on history $x$ . An explanation assigns each input feature $x_i$ an attribution score $m[i] \in \mathbb{R}$ , quantifying its contribution to $f^y(x)$ .

Fidelity is defined as the degree to which the explanation map accurately reflects the model's decision process under feature perturbation. Measuring fidelity involves masking the top- $K$ explanatory features and computing counterfactual metrics, such as:

$\mathrm{POS@K_r,K_e}$ : Binary indicator of whether $y$ remains ranked in the top- $K_r$ after removing $K_e$ features.
$\mathrm{DEL@K_e} = \frac{f^y(\bar{x})}{f^y(x)}$ : Score ratio after masking ( $\bar{x}$ ) vs. original.
$\mathrm{INS@K_e} = \frac{f^y(x \cup \{\text{top$K_e$features}\})}{f^y(x)}$: Score when only top features are present.
$\mathrm{CDCG@K_e} = \sum_{t=1}^{\text{rank\_drop}} 1/\log_2(1+t)$ .

Prevailing methods suffer from low fidelity when applied to sparse binary inputs, particularly those using fixed "zero" baselines or non-counterfactual heuristics, due to vanishing gradients and a failure to capture absence signals.

2. Stochastic Path Integration Framework

Integrated Gradients (IG) formalizes feature attribution for $f^y(x)$ relative to a baseline $x_0$ : $\phi_j(x) = (x_j - x_{0,j}) \int_{\alpha=0}^1 \frac{\partial\,f^y(x_0 + \alpha(x - x_0))}{\partial x_j} d\alpha$ In practice, the integral is discretized with $R$ steps of linear interpolation.

SPINRec replaces the fixed $x_0$ with a set of $k$ plausible baselines $\{z_1, \dots, z_k\}$ sampled from the empirical distribution of user profiles. For each $z_i$ , IG is computed to produce a candidate map $m^{(i)}$ . A fidelity score $s(m)$ (e.g., AUC of $\mathrm{DEL}$ or $\mathrm{POS}$ curves) is then evaluated per map, and the final explanation $\displaystyle m^* = \arg\max_{m \in M} s(m)$ is selected from the set $M$ of candidate maps. Optionally, the average map $\bar m = \frac{1}{k}\sum_i m^{(i)}$ may be considered as an "expected paths" variant.

3. Algorithmic Details and Computational Complexity

Pseudocode for SPINRec is as follows:

Algorithm SPINRec(x, f, y, k, R, metric s)
Input:
  x ∈ {0,1}^{|V|}      # user history vector
  f                    # model → [0,1] scores
  y ∈ V                # target item
  k                    # number of baselines
  R                    # number of IG steps
  s(·)                 # fidelity metric
Output:
  m* ∈ ℝ^{|V|}         # final attribution map
1.  Sample baselines B ← { z₁,…,z_k } ⊂ 𝕌 uniformly at random
2.  M ← ∅
3.  For each z ∈ B:
4.    m ← zero vector of length |V|
5.    For t=1…R:
6.      α ← t/R
7.      x_t ← z + α·(x – z)
8.      grad ← ∇_x f^y(x_t)     # backprop gradient
9.      m ← m + grad
10.   m ← (x – z) ∘ (m/R)      # elementwise multiply
11.   Add m to M
12. End For
13. m* ← argmax_{m ∈ M} s(m)
14. Return m*

Each baseline requires $R$ gradient computations $(O(RQ)$ , $Q \approx$ model parameter count) and a fidelity test $(O(N|V|)$ for $N$ perturbations). Total cost is $O(k(RQ + N|V|))$ , but in practice $R \ll N|V|$ and all $k$ paths can be computed in parallel. Sparse storage and vectorization are used for efficiency.

4. Empirical Evaluation Protocol

SPINRec is benchmarked using three binarized implicit feedback datasets:

Dataset	Recommendation Models	User Split / Setup
ML-1M	MF, VAE, NCF	80/20 split, 10% holdout
Yahoo! Music	MF, VAE, NCF	80/20 split, 10% holdout
Pinterest	MF, VAE, NCF	80/20 split, 10% holdout

Counterfactual fidelity metrics include AUC-style perturbation curves and fixed-length diagnostics (POS, DEL, INS, CDCG), aligning with Baklanov et al. and LXR protocols.

Baselines tested:

Cosine-Similarity heuristic
SHAP4Rec (Shapley approximation)
DeepSHAP
LIME-RS, LIRE (importance-sampling LIME)
FIA, ACCENT (influence-function)
LXR (learned explainer)
PI (plain IG with zero baseline)
SPINRec

5. Quantitative Results and Qualitative Insights

SPINRec consistently achieves superior fidelity on all tested models and datasets, with statistically significant improvements ( $p \leq 0.01$ vs. LXR and other strong baselines):

3–10% lower $\mathrm{POS@5}$ and $\mathrm{POS@10}$ (better rank collapse under feature removal)
4–8% lower $\mathrm{DEL@K_e}$ (greater score drops)
1–3% higher $\mathrm{INS@K_e}$ (better restoration using top features)

Ablation reveals that plain IG (zero baseline) is competitive but always outperformed when assessed by counterfactual metrics, especially for VAE and NCF models, where the absence of interaction embeds additional signal. Performance gains saturate at $k \geq 10$ baselines.

Qualitative analysis demonstrates that classical IG with zero baselines isolates only present (nonzero) items, overlooking how the lack of interaction on others influences recommendations. SPINRec's stochastic baselines capture this effect, yielding more nuanced and stable attribution maps. The maps produced by selecting the highest-fidelity path align more closely with observed rank collapses when top explanatory items are removed.

6. Significance and Future Directions

SPINRec represents the first model-agnostic stochastic path integration approach tailored for recommender systems with sparse, binary inputs. By sampling empirically plausible baselines and selecting explanations by their fidelity under counterfactual evaluation, SPINRec addresses key limitations with prior approaches and sets new benchmarks for MF, VAE, and NCF models across standard datasets.

Planned directions include extension to multi-modal and sequential recommenders, acceleration via learned baseline samplers or direct fidelity approximations, and the integration of human-in-the-loop feedback to iteratively refine baseline distributions. All code, masking and evaluation pipelines are publicly available at https://github.com/DeltaLabTLV/SPINRec (Barkan et al., 22 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

Fidelity-Aware Recommendation Explanations via Stochastic Path Integration (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to SPINRec (Stochastic Path Integration for Neural Recommender Explanations).