SPUMR: Similarity Propagation in Multimodal Recommendation
- The paper introduces SPUMR, a framework that combines similarity propagation over both content and collaborative graphs with modality-specific uncertainty modeling to enhance ranking accuracy.
- It constructs KNN graphs for users and items, applying GCN propagation to denoise and refine feature embeddings across multiple modalities.
- Empirical evaluations on Amazon datasets show 3–7% improvements in Recall@K and NDCG@K, demonstrating SPUMR's effectiveness and robustness.
Similarity Propagation-enhanced Uncertainty for Multimodal Recommendation (SPUMR) is a framework for multimodal recommendation systems that simultaneously propagates similarity across content and collaborative graphs and explicitly models modality-specific uncertainty to improve recommendation robustness and ranking accuracy. SPUMR is designed to address two fundamental issues in MMR: (i) noise and uncertainty inherent in modality features (such as noisy images or ambiguous text), and (ii) ineffective fusion strategies that disregard both the varying reliability of each modality and the rich similarity structure present among users and items (Wu et al., 27 Jan 2026).
1. Problem Formulation and Notation
Multimodal recommendation aims to predict user preference by leveraging multiple modalities per item, such as visual (v) and textual (t) features. Let denote users, denote items, and denote modalities. Each item has modality-specific feature vectors for . All features for modality are collected in . The user-item interaction matrix encodes implicit feedback.
The modeling goal is to learn low-dimensional embeddings for users and items such that the predicted preference score ranks observed (positive) user-item pairs above unobserved (negative) pairs, all while explicitly representing and mitigating modality-specific uncertainty.
Letting be the set of items interacted with by user and the set of users interacting with item , the model parameters include projection matrices, GNN weights, multilayer perceptrons (MLPs) for uncertainty, and gating networks.
The training objective integrates four loss components:
where is the Bayesian Personalized Ranking loss, a contrastive loss, a KL divergence penalty for uncertainty, and penalizes unreliably weighted modal experts.
2. Modality Similarity Graph Construction
SPUMR constructs two K-nearest neighbor (KNN) graphs for each modality :
(a) User-modal Interest Graph (): The initial user interest profile for each modality,
where is the projected feature for item in modality . Edges are formed by cosine similarity , connecting each user to its top- peers. L layers of symmetric GCN are applied to propagate and refine these profiles, yielding final modality-specific user embeddings .
(b) Item-modal Semantic Graph (): Similar KNN graphs are constructed among items using . Symmetric GCN propagation produces item embeddings for each modality.
These two graphs aim to denoise and enrich initial feature representations by leveraging local similarity in both user preference and item content structure.
3. Collaborative Similarity Refinement
Subsequent to modality graph propagation, SPUMR further refines entity embeddings using collaborative similarity, based exclusively on interaction patterns, via:
(a) User-user Collaborative Graph (): Edge strength between users is defined by their Jaccard similarity over interacted items:
Normalized connections are used to propagate modality-refined embeddings, yielding collaborative-multimodal embeddings .
(b) Item-item Collaborative Graph (): Analogously, the Jaccard similarity is constructed over users for each item pair , generating further-refined embeddings .
After both propagation stages, each user or item has collaborative, modality-specific representations .
4. Uncertainty Modeling and Preference Aggregation
Each is modeled as a sample from a Gaussian “expert”:
Using the reparameterization trick, stochastic embeddings are sampled:
KL divergence regularization mitigates variance collapse.
For uncertainty-aware preference aggregation, a gating network assigns each modality a weight based on reliability, with sparsification via Top-K selection:
The final entity embedding is the weighted sum:
An auxiliary loss penalizes high weights for noisy experts.
5. Full Algorithmic Pipeline
SPUMR can be summarized algorithmically as follows:
- Input raw modality features and user-item interaction matrix .
- Project each raw feature: .
- Construct modality-based KNN graphs for users and items; perform -layer GCN propagation to obtain .
- Build user-user and item-item collaborative graphs via Jaccard similarity; propagate one layer to yield .
- Use two MLPs per modality to estimate mean and variance, sample per Gaussian expert.
- Compute gating weights and aggregate into final fused embeddings.
- Compute preference scores .
- Calculate losses , , , ; update parameters by backpropagation.
6. Empirical Evaluation
SPUMR was evaluated on three Amazon 5-core benchmarks (Baby, Sports, Clothing). Recommendations were assessed using Recall@K and NDCG@K, with . Baselines included matrix factorization (MF-BPR), LightGCN, uncertainty-aware (OrdRec, VAE-AUR), and several multimodal methods (VBPR, MMGCN, FREEDOM, LGMRec, among others).
SPUMR achieved the highest performance on all datasets. On the Baby dataset:
- Recall@10 improved from 0.0644 (best baseline, LGMRec) to 0.0711 (+6.92%)
- NDCG@10 improved from 0.0361 to 0.0385 (+6.65%)
Similar improvements of 3–7% were observed for Sports and Clothing. Ablation experiments demonstrated that removing modality similarity graphs, collaborative similarity graphs, or uncertainty-aware preference aggregation each leads to 1–3% absolute drops in recall and NDCG, indicating the necessity of each architectural component.
t-SNE visualizations revealed that SPUMR produces more uniformly separated user clusters than MMGCN and FREEDOM, an effect corresponding to improved generalization and robustness.
7. Limitations and Future Directions
Identified challenges include computational overhead resulting from multiple graph propagations and the need for efficient stochastic sampling. Possible mitigations include graph sparsification and more efficient sampling schemes.
Future research directions encompass:
- Generalizing SPUMR to accommodate additional modalities, such as audio or structured metadata.
- Integrating large-scale pre-trained multimodal backbones (e.g., CLIP) for feature extraction and for uncertainty estimation.
- Exploring adaptive graph construction—learning edge weights end-to-end—beyond fixed KNN or Jaccard similarity kernels.
SPUMR represents a unified framework combining similarity propagation and explicit uncertainty modeling for multimodal recommendation, establishing a new state of the art and suggesting broader potential for similarity-based uncertainty integration in recommender architectures (Wu et al., 27 Jan 2026).