Papers
Topics
Authors
Recent
Search
2000 character limit reached

SPUMR: Similarity Propagation in Multimodal Recommendation

Updated 3 February 2026
  • The paper introduces SPUMR, a framework that combines similarity propagation over both content and collaborative graphs with modality-specific uncertainty modeling to enhance ranking accuracy.
  • It constructs KNN graphs for users and items, applying GCN propagation to denoise and refine feature embeddings across multiple modalities.
  • Empirical evaluations on Amazon datasets show 3–7% improvements in Recall@K and NDCG@K, demonstrating SPUMR's effectiveness and robustness.

Similarity Propagation-enhanced Uncertainty for Multimodal Recommendation (SPUMR) is a framework for multimodal recommendation systems that simultaneously propagates similarity across content and collaborative graphs and explicitly models modality-specific uncertainty to improve recommendation robustness and ranking accuracy. SPUMR is designed to address two fundamental issues in MMR: (i) noise and uncertainty inherent in modality features (such as noisy images or ambiguous text), and (ii) ineffective fusion strategies that disregard both the varying reliability of each modality and the rich similarity structure present among users and items (Wu et al., 27 Jan 2026).

1. Problem Formulation and Notation

Multimodal recommendation aims to predict user preference by leveraging multiple modalities per item, such as visual (v) and textual (t) features. Let U={u}\mathcal{U} = \{ u \} denote users, I={i}\mathcal{I} = \{ i \} denote items, and MM denote modalities. Each item ii has modality-specific feature vectors eimRdme_i^m \in \mathbb{R}^{d_m} for mM={v,t}m \in \mathcal{M} = \{ v, t \}. All features for modality mm are collected in EmRI×dmE^m \in \mathbb{R}^{|\mathcal{I}| \times d_m}. The user-item interaction matrix RR encodes implicit feedback.

The modeling goal is to learn low-dimensional embeddings zu,ziRdz_u, z_i \in \mathbb{R}^d for users and items such that the predicted preference score y^ui=zuzi\hat{y}_{ui} = z_u^\top z_i ranks observed (positive) user-item pairs above unobserved (negative) pairs, all while explicitly representing and mitigating modality-specific uncertainty.

Letting Iu\mathcal{I}_u be the set of items interacted with by user uu and Bi\mathcal{B}_i the set of users interacting with item ii, the model parameters Θ\Theta include projection matrices, GNN weights, multilayer perceptrons (MLPs) for uncertainty, and gating networks.

The training objective integrates four loss components:

minΘL(Θ)=LBPR+λCLLCL+λKLLKL+λULU\min_\Theta L(\Theta) = L_{BPR} + \lambda_{CL} L_{CL} + \lambda_{KL} L_{KL} + \lambda_U L_U

where LBPRL_{BPR} is the Bayesian Personalized Ranking loss, LCLL_{CL} a contrastive loss, LKLL_{KL} a KL divergence penalty for uncertainty, and LUL_U penalizes unreliably weighted modal experts.

2. Modality Similarity Graph Construction

SPUMR constructs two K-nearest neighbor (KNN) graphs for each modality mm:

(a) User-modal Interest Graph (GUm\mathcal{G}_U^m): The initial user interest profile for each modality,

pum=1IuiIuximp_u^m = \frac{1}{|\mathcal{I}_u|} \sum_{i \in \mathcal{I}_u} x_i^m

where ximx_i^m is the projected feature xim=Wmeim+bmx_i^m = W^m e_i^m + b^m for item ii in modality mm. Edges are formed by cosine similarity suvms_{uv}^m, connecting each user to its top-kk peers. L layers of symmetric GCN are applied to propagate and refine these profiles, yielding final modality-specific user embeddings humh_u^m.

(b) Item-modal Semantic Graph (GIm\mathcal{G}_I^m): Similar KNN graphs are constructed among items using sijm=cos(xim,xjm)s_{ij}^m = \cos(x_i^m, x_j^m). Symmetric GCN propagation produces item embeddings himh_i^m for each modality.

These two graphs aim to denoise and enrich initial feature representations by leveraging local similarity in both user preference and item content structure.

3. Collaborative Similarity Refinement

Subsequent to modality graph propagation, SPUMR further refines entity embeddings using collaborative similarity, based exclusively on interaction patterns, via:

(a) User-user Collaborative Graph (GUc\mathcal{G}_U^c): Edge strength between users u,vu, v is defined by their Jaccard similarity over interacted items:

wuv=IuIvIuIvw_{uv} = \frac{|\mathcal{I}_u \cap \mathcal{I}_v|}{|\mathcal{I}_u \cup \mathcal{I}_v|}

Normalized connections are used to propagate modality-refined embeddings, yielding collaborative-multimodal embeddings huc,mh_u^{c,m}.

(b) Item-item Collaborative Graph (GIc\mathcal{G}_I^c): Analogously, the Jaccard similarity is constructed over users for each item pair (i,j)(i, j), generating further-refined embeddings hic,mh_i^{c,m}.

After both propagation stages, each user or item eUIe \in \mathcal{U} \cup \mathcal{I} has collaborative, modality-specific representations hec,mh_e^{c,m}.

4. Uncertainty Modeling and Preference Aggregation

Each hec,mh_e^{c,m} is modeled as a sample from a Gaussian “expert”:

  • μem=MLPμ(hec,m)\mu_e^m = \mathrm{MLP}_\mu \left( h_e^{c,m} \right)
  • log(σem)2=MLPσ(hec,m)\log (\sigma_e^m)^2 = \mathrm{MLP}_\sigma \left( h_e^{c,m} \right)

Using the reparameterization trick, stochastic embeddings are sampled:

zem=μem+σemε,εN(0,I)z_e^m = \mu_e^m + \sigma_e^m \odot \varepsilon,\quad \varepsilon \sim \mathcal{N}(0, I)

KL divergence regularization LKLL_{KL} mitigates variance collapse.

For uncertainty-aware preference aggregation, a gating network GG assigns each modality a weight gemg_e^m based on reliability, with sparsification via Top-K selection:

G([hec,1;;hec,M])=softmax(TopK(Wg[hec,1;;hec,M]))G([h_e^{c,1};\ldots;h_e^{c,M}]) = \mathrm{softmax}\left(\mathrm{TopK}(W_g [h_e^{c,1};\ldots;h_e^{c,M}])\right)

The final entity embedding is the weighted sum:

ze=m=1Mgemzemz_e = \sum_{m=1}^M g_e^m \cdot z_e^m

An auxiliary loss LU=e,mgemσem22L_U = \sum_{e,m} g_e^m \lVert \sigma_e^m \rVert_2^2 penalizes high weights for noisy experts.

5. Full Algorithmic Pipeline

SPUMR can be summarized algorithmically as follows:

  1. Input raw modality features {Em}\{E^m\} and user-item interaction matrix RR.
  2. Project each raw feature: xim=Wmeim+bmx_i^m = W^m e_i^m + b^m.
  3. Construct modality-based KNN graphs for users and items; perform LL-layer GCN propagation to obtain hum,himh_u^m, h_i^m.
  4. Build user-user and item-item collaborative graphs via Jaccard similarity; propagate one layer to yield hec,mh_e^{c,m}.
  5. Use two MLPs per modality to estimate mean and variance, sample zemz_e^m per Gaussian expert.
  6. Compute gating weights and aggregate into final fused embeddings.
  7. Compute preference scores y^ui=zuzi\hat{y}_{ui} = z_u^\top z_i.
  8. Calculate losses LBPRL_{BPR}, LCLL_{CL}, LKLL_{KL}, LUL_U; update parameters by backpropagation.

6. Empirical Evaluation

SPUMR was evaluated on three Amazon 5-core benchmarks (Baby, Sports, Clothing). Recommendations were assessed using Recall@K and NDCG@K, with K{10,20}K \in \{10,20\}. Baselines included matrix factorization (MF-BPR), LightGCN, uncertainty-aware (OrdRec, VAE-AUR), and several multimodal methods (VBPR, MMGCN, FREEDOM, LGMRec, among others).

SPUMR achieved the highest performance on all datasets. On the Baby dataset:

  • Recall@10 improved from 0.0644 (best baseline, LGMRec) to 0.0711 (+6.92%)
  • NDCG@10 improved from 0.0361 to 0.0385 (+6.65%)

Similar improvements of 3–7% were observed for Sports and Clothing. Ablation experiments demonstrated that removing modality similarity graphs, collaborative similarity graphs, or uncertainty-aware preference aggregation each leads to 1–3% absolute drops in recall and NDCG, indicating the necessity of each architectural component.

t-SNE visualizations revealed that SPUMR produces more uniformly separated user clusters than MMGCN and FREEDOM, an effect corresponding to improved generalization and robustness.

7. Limitations and Future Directions

Identified challenges include computational overhead resulting from multiple graph propagations and the need for efficient stochastic sampling. Possible mitigations include graph sparsification and more efficient sampling schemes.

Future research directions encompass:

  • Generalizing SPUMR to accommodate additional modalities, such as audio or structured metadata.
  • Integrating large-scale pre-trained multimodal backbones (e.g., CLIP) for feature extraction and for uncertainty estimation.
  • Exploring adaptive graph construction—learning edge weights end-to-end—beyond fixed KNN or Jaccard similarity kernels.

SPUMR represents a unified framework combining similarity propagation and explicit uncertainty modeling for multimodal recommendation, establishing a new state of the art and suggesting broader potential for similarity-based uncertainty integration in recommender architectures (Wu et al., 27 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Similarity Propagation-enhanced Uncertainty for Multimodal Recommendation (SPUMR).