SPINRec: Stochastic Path Integration Explanations
- The paper introduces SPINRec, a model-agnostic framework that uses stochastic baseline sampling and path integration to generate more faithful explanations than fixed baseline methods.
- The approach is rigorously evaluated on MF, VAE, and NCF models across datasets like MovieLens-1M, Yahoo! Music, and Pinterest, showing significant improvements on counterfactual metrics such as DEL@K and POS@K.
- SPINRec offers scalable computation through parallel stochastic paths, paving the way for enhanced interpretability of recommender systems and potential extensions to multi-modal and sequential models.
SPINRec (Stochastic Path Integration for Neural Recommender Explanations) is a model-agnostic framework designed to generate fidelity-aware explanations for neural recommender systems operating on sparse, implicit feedback data. Unlike classical attribution methods, which often rely on fixed or unrealistic baselines, SPINRec utilizes stochastic baseline sampling and path integration to maximize the faithfulness of feature relevance scores with respect to actual model reasoning, as assessed by counterfactual metrics. The approach is evaluated extensively across matrix factorization (MF), variational autoencoder (VAE), and neural collaborative filtering (NCF) models using MovieLens-1M, Yahoo! Music, and Pinterest datasets, and establishes new benchmarks for explanation fidelity (Barkan et al., 22 Nov 2025).
1. Formal Problem Statement and Key Notation
Let denote the set of users and the set of items. Each user is associated with a binary interaction vector , recording whether has interacted with item or not . A trained recommender outputs affinity scores for a target item conditioned on history . An explanation assigns each input feature an attribution score , quantifying its contribution to .
Fidelity is defined as the degree to which the explanation map accurately reflects the model's decision process under feature perturbation. Measuring fidelity involves masking the top- explanatory features and computing counterfactual metrics, such as:
- : Binary indicator of whether remains ranked in the top- after removing features.
- : Score ratio after masking () vs. original.
- $\mathrm{INS@K_e} = \frac{f^y(x \cup \{\text{top$K_e$features}\})}{f^y(x)}$: Score when only top features are present.
- .
Prevailing methods suffer from low fidelity when applied to sparse binary inputs, particularly those using fixed "zero" baselines or non-counterfactual heuristics, due to vanishing gradients and a failure to capture absence signals.
2. Stochastic Path Integration Framework
Integrated Gradients (IG) formalizes feature attribution for relative to a baseline : In practice, the integral is discretized with steps of linear interpolation.
SPINRec replaces the fixed with a set of plausible baselines sampled from the empirical distribution of user profiles. For each , IG is computed to produce a candidate map . A fidelity score (e.g., AUC of or curves) is then evaluated per map, and the final explanation is selected from the set of candidate maps. Optionally, the average map may be considered as an "expected paths" variant.
3. Algorithmic Details and Computational Complexity
Pseudocode for SPINRec is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
Algorithm SPINRec(x, f, y, k, R, metric s) Input: x ∈ {0,1}^{|V|} # user history vector f # model → [0,1] scores y ∈ V # target item k # number of baselines R # number of IG steps s(·) # fidelity metric Output: m* ∈ ℝ^{|V|} # final attribution map 1. Sample baselines B ← { z₁,…,z_k } ⊂ 𝕌 uniformly at random 2. M ← ∅ 3. For each z ∈ B: 4. m ← zero vector of length |V| 5. For t=1…R: 6. α ← t/R 7. x_t ← z + α·(x – z) 8. grad ← ∇_x f^y(x_t) # backprop gradient 9. m ← m + grad 10. m ← (x – z) ∘ (m/R) # elementwise multiply 11. Add m to M 12. End For 13. m* ← argmax_{m ∈ M} s(m) 14. Return m* |
Each baseline requires gradient computations , model parameter count) and a fidelity test for perturbations). Total cost is , but in practice and all paths can be computed in parallel. Sparse storage and vectorization are used for efficiency.
4. Empirical Evaluation Protocol
SPINRec is benchmarked using three binarized implicit feedback datasets:
| Dataset | Recommendation Models | User Split / Setup |
|---|---|---|
| ML-1M | MF, VAE, NCF | 80/20 split, 10% holdout |
| Yahoo! Music | MF, VAE, NCF | 80/20 split, 10% holdout |
| MF, VAE, NCF | 80/20 split, 10% holdout |
Counterfactual fidelity metrics include AUC-style perturbation curves and fixed-length diagnostics (POS, DEL, INS, CDCG), aligning with Baklanov et al. and LXR protocols.
Baselines tested:
- Cosine-Similarity heuristic
- SHAP4Rec (Shapley approximation)
- DeepSHAP
- LIME-RS, LIRE (importance-sampling LIME)
- FIA, ACCENT (influence-function)
- LXR (learned explainer)
- PI (plain IG with zero baseline)
- SPINRec
5. Quantitative Results and Qualitative Insights
SPINRec consistently achieves superior fidelity on all tested models and datasets, with statistically significant improvements ( vs. LXR and other strong baselines):
- 3–10% lower and (better rank collapse under feature removal)
- 4–8% lower (greater score drops)
- 1–3% higher (better restoration using top features)
Ablation reveals that plain IG (zero baseline) is competitive but always outperformed when assessed by counterfactual metrics, especially for VAE and NCF models, where the absence of interaction embeds additional signal. Performance gains saturate at baselines.
Qualitative analysis demonstrates that classical IG with zero baselines isolates only present (nonzero) items, overlooking how the lack of interaction on others influences recommendations. SPINRec's stochastic baselines capture this effect, yielding more nuanced and stable attribution maps. The maps produced by selecting the highest-fidelity path align more closely with observed rank collapses when top explanatory items are removed.
6. Significance and Future Directions
SPINRec represents the first model-agnostic stochastic path integration approach tailored for recommender systems with sparse, binary inputs. By sampling empirically plausible baselines and selecting explanations by their fidelity under counterfactual evaluation, SPINRec addresses key limitations with prior approaches and sets new benchmarks for MF, VAE, and NCF models across standard datasets.
Planned directions include extension to multi-modal and sequential recommenders, acceleration via learned baseline samplers or direct fidelity approximations, and the integration of human-in-the-loop feedback to iteratively refine baseline distributions. All code, masking and evaluation pipelines are publicly available at https://github.com/DeltaLabTLV/SPINRec (Barkan et al., 22 Nov 2025).