SPUMR: Similarity Propagation in Multimodal Recommendation

Updated 3 February 2026

The paper introduces SPUMR, a framework that combines similarity propagation over both content and collaborative graphs with modality-specific uncertainty modeling to enhance ranking accuracy.
It constructs KNN graphs for users and items, applying GCN propagation to denoise and refine feature embeddings across multiple modalities.
Empirical evaluations on Amazon datasets show 3–7% improvements in Recall@K and NDCG@K, demonstrating SPUMR's effectiveness and robustness.

Similarity Propagation-enhanced Uncertainty for Multimodal Recommendation (SPUMR) is a framework for multimodal recommendation systems that simultaneously propagates similarity across content and collaborative graphs and explicitly models modality-specific uncertainty to improve recommendation robustness and ranking accuracy. SPUMR is designed to address two fundamental issues in MMR: (i) noise and uncertainty inherent in modality features (such as noisy images or ambiguous text), and (ii) ineffective fusion strategies that disregard both the varying reliability of each modality and the rich similarity structure present among users and items (Wu et al., 27 Jan 2026).

1. Problem Formulation and Notation

Multimodal recommendation aims to predict user preference by leveraging multiple modalities per item, such as visual (v) and textual (t) features. Let $\mathcal{U} = \{ u \}$ denote users, $\mathcal{I} = \{ i \}$ denote items, and $M$ denote modalities. Each item $i$ has modality-specific feature vectors $e_i^m \in \mathbb{R}^{d_m}$ for $m \in \mathcal{M} = \{ v, t \}$ . All features for modality $m$ are collected in $E^m \in \mathbb{R}^{|\mathcal{I}| \times d_m}$ . The user-item interaction matrix $R$ encodes implicit feedback.

The modeling goal is to learn low-dimensional embeddings $z_u, z_i \in \mathbb{R}^d$ for users and items such that the predicted preference score $\hat{y}_{ui} = z_u^\top z_i$ ranks observed (positive) user-item pairs above unobserved (negative) pairs, all while explicitly representing and mitigating modality-specific uncertainty.

Letting $\mathcal{I}_u$ be the set of items interacted with by user $u$ and $\mathcal{B}_i$ the set of users interacting with item $i$ , the model parameters $\Theta$ include projection matrices, GNN weights, multilayer perceptrons (MLPs) for uncertainty, and gating networks.

The training objective integrates four loss components:

$\min_\Theta L(\Theta) = L_{BPR} + \lambda_{CL} L_{CL} + \lambda_{KL} L_{KL} + \lambda_U L_U$

where $L_{BPR}$ is the Bayesian Personalized Ranking loss, $L_{CL}$ a contrastive loss, $L_{KL}$ a KL divergence penalty for uncertainty, and $L_U$ penalizes unreliably weighted modal experts.

2. Modality Similarity Graph Construction

SPUMR constructs two K-nearest neighbor (KNN) graphs for each modality $m$ :

(a) User-modal Interest Graph ( $\mathcal{G}_U^m$ ): The initial user interest profile for each modality,

$p_u^m = \frac{1}{|\mathcal{I}_u|} \sum_{i \in \mathcal{I}_u} x_i^m$

where $x_i^m$ is the projected feature $x_i^m = W^m e_i^m + b^m$ for item $i$ in modality $m$ . Edges are formed by cosine similarity $s_{uv}^m$ , connecting each user to its top- $k$ peers. L layers of symmetric GCN are applied to propagate and refine these profiles, yielding final modality-specific user embeddings $h_u^m$ .

(b) Item-modal Semantic Graph ( $\mathcal{G}_I^m$ ): Similar KNN graphs are constructed among items using $s_{ij}^m = \cos(x_i^m, x_j^m)$ . Symmetric GCN propagation produces item embeddings $h_i^m$ for each modality.

These two graphs aim to denoise and enrich initial feature representations by leveraging local similarity in both user preference and item content structure.

Subsequent to modality graph propagation, SPUMR further refines entity embeddings using collaborative similarity, based exclusively on interaction patterns, via:

(a) User-user Collaborative Graph ( $\mathcal{G}_U^c$ ): Edge strength between users $u, v$ is defined by their Jaccard similarity over interacted items:

$w_{uv} = \frac{|\mathcal{I}_u \cap \mathcal{I}_v|}{|\mathcal{I}_u \cup \mathcal{I}_v|}$

Normalized connections are used to propagate modality-refined embeddings, yielding collaborative-multimodal embeddings $h_u^{c,m}$ .

(b) Item-item Collaborative Graph ( $\mathcal{G}_I^c$ ): Analogously, the Jaccard similarity is constructed over users for each item pair $(i, j)$ , generating further-refined embeddings $h_i^{c,m}$ .

After both propagation stages, each user or item $e \in \mathcal{U} \cup \mathcal{I}$ has collaborative, modality-specific representations $h_e^{c,m}$ .

4. Uncertainty Modeling and Preference Aggregation

Each $h_e^{c,m}$ is modeled as a sample from a Gaussian “expert”:

$\mu_e^m = \mathrm{MLP}_\mu \left( h_e^{c,m} \right)$
$\log (\sigma_e^m)^2 = \mathrm{MLP}_\sigma \left( h_e^{c,m} \right)$

Using the reparameterization trick, stochastic embeddings are sampled:

$z_e^m = \mu_e^m + \sigma_e^m \odot \varepsilon,\quad \varepsilon \sim \mathcal{N}(0, I)$

KL divergence regularization $L_{KL}$ mitigates variance collapse.

For uncertainty-aware preference aggregation, a gating network $G$ assigns each modality a weight $g_e^m$ based on reliability, with sparsification via Top-K selection:

$G([h_e^{c,1};\ldots;h_e^{c,M}]) = \mathrm{softmax}\left(\mathrm{TopK}(W_g [h_e^{c,1};\ldots;h_e^{c,M}])\right)$

The final entity embedding is the weighted sum:

$z_e = \sum_{m=1}^M g_e^m \cdot z_e^m$

An auxiliary loss $L_U = \sum_{e,m} g_e^m \lVert \sigma_e^m \rVert_2^2$ penalizes high weights for noisy experts.

5. Full Algorithmic Pipeline

SPUMR can be summarized algorithmically as follows:

Input raw modality features $\{E^m\}$ and user-item interaction matrix $R$ .
Project each raw feature: $x_i^m = W^m e_i^m + b^m$ .
Construct modality-based KNN graphs for users and items; perform $L$ -layer GCN propagation to obtain $h_u^m, h_i^m$ .
Build user-user and item-item collaborative graphs via Jaccard similarity; propagate one layer to yield $h_e^{c,m}$ .
Use two MLPs per modality to estimate mean and variance, sample $z_e^m$ per Gaussian expert.
Compute gating weights and aggregate into final fused embeddings.
Compute preference scores $\hat{y}_{ui} = z_u^\top z_i$ .
Calculate losses $L_{BPR}$ , $L_{CL}$ , $L_{KL}$ , $L_U$ ; update parameters by backpropagation.

6. Empirical Evaluation

SPUMR was evaluated on three Amazon 5-core benchmarks (Baby, Sports, Clothing). Recommendations were assessed using Recall@K and NDCG@K, with $K \in \{10,20\}$ . Baselines included matrix factorization (MF-BPR), LightGCN, uncertainty-aware (OrdRec, VAE-AUR), and several multimodal methods (VBPR, MMGCN, FREEDOM, LGMRec, among others).

SPUMR achieved the highest performance on all datasets. On the Baby dataset:

Recall@10 improved from 0.0644 (best baseline, LGMRec) to 0.0711 (+6.92%)
NDCG@10 improved from 0.0361 to 0.0385 (+6.65%)

Similar improvements of 3–7% were observed for Sports and Clothing. Ablation experiments demonstrated that removing modality similarity graphs, collaborative similarity graphs, or uncertainty-aware preference aggregation each leads to 1–3% absolute drops in recall and NDCG, indicating the necessity of each architectural component.

t-SNE visualizations revealed that SPUMR produces more uniformly separated user clusters than MMGCN and FREEDOM, an effect corresponding to improved generalization and robustness.

7. Limitations and Future Directions

Identified challenges include computational overhead resulting from multiple graph propagations and the need for efficient stochastic sampling. Possible mitigations include graph sparsification and more efficient sampling schemes.

Future research directions encompass:

Generalizing SPUMR to accommodate additional modalities, such as audio or structured metadata.
Integrating large-scale pre-trained multimodal backbones (e.g., CLIP) for feature extraction and for uncertainty estimation.
Exploring adaptive graph construction—learning edge weights end-to-end—beyond fixed KNN or Jaccard similarity kernels.

SPUMR represents a unified framework combining similarity propagation and explicit uncertainty modeling for multimodal recommendation, establishing a new state of the art and suggesting broader potential for similarity-based uncertainty integration in recommender architectures (Wu et al., 27 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Similarity Propagation-enhanced Uncertainty for Multimodal Recommendation (SPUMR).

SPUMR: Similarity Propagation in Multimodal Recommendation

1. Problem Formulation and Notation

2. Modality Similarity Graph Construction

3. Collaborative Similarity Refinement

4. Uncertainty Modeling and Preference Aggregation

5. Full Algorithmic Pipeline

6. Empirical Evaluation

7. Limitations and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

SPUMR: Similarity Propagation in Multimodal Recommendation

1. Problem Formulation and Notation

2. Modality Similarity Graph Construction

3. Collaborative Similarity Refinement

4. Uncertainty Modeling and Preference Aggregation

5. Full Algorithmic Pipeline

6. Empirical Evaluation

7. Limitations and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics