Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
103 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
50 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Similarity-Guided Diffusion for Sequential Recommendation

Updated 17 July 2025
  • The paper introduces a similarity-guided diffusion framework that integrates semantic noise with contrastive learning to improve sequential recommendation.
  • It employs deterministic noise injection based on item similarity and confidence-guided augmentation to preserve the contextual integrity of user sequences.
  • Empirical results show significant gains in metrics like HR@K and NDCG@K, outperforming several state-of-the-art models on standard datasets.

Similarity-Guided Diffusion for Contrastive Sequential Recommendation describes a framework that integrates semantic similarity-driven augmentation within a diffusion-based generative process to enhance contrastive learning in sequential recommendation systems. The approach, exemplified by SimDiffRec (Choi et al., 16 Jul 2025), prioritizes semantic and contextual preservation during sequence augmentation by leveraging item-item similarity and model confidence scores, addressing key limitations of random augmentation methods and improving the discriminative quality of positive and negative samples for contrastive representation learning.

1. Motivation and Conceptual Overview

Traditional sequential recommendation (SR) models, particularly those utilizing Transformer architectures, grapple with severe data sparsity and the challenge of encoding long-range dependencies in user interaction histories. To counteract sparsity and enrich representation learning, recent advances have embraced data augmentation and contrastive learning. However, many such methods employ stochastic or random strategies (e.g., random noise injection, masking, or cropping), which often disturb the contextual structure of the original sequence and degrade recommendation quality.

Similarity-Guided Diffusion for Contrastive Sequential Recommendation introduces a principled augmentation mechanism in which only semantically consistent noise—computed based on learned item embedding similarities—is diffused through user sequences. Further, the model strategically localizes augmentation to positions in the sequence where the denoising model is highly confident, thereby preserving the underlying behavioral context while maximizing the utility of contrastive learning.

2. Framework and Methodological Components

2.1 Semantic Similarity-Based Noise

SimDiffRec identifies, for each item embedding eue_u in the input sequence, the top knoisek_{\text{noise}} most similar items in the embedding matrix (excluding the item itself), based on dot-product similarity:

similarity=euWT\text{similarity} = e_u \cdot W^T

where WW is the complete item embedding matrix.

The selected similar embeddings produce a deterministic noise vector for diffusion:

noise=1ki=1keji\text{noise} = \frac{1}{k} \sum_{i=1}^{k} \mathbf{e}_{j_i}

where jij_i indexes the top kk most similar items. This method ensures that noise injected during the diffusion process remains semantically close to the original item, minimizing the risk of corrupting contextual information.

2.2 Diffusion Process

The forward process incorporates this similarity-guided noise in a deterministic manner:

zt=αtzt1+βtnoise\mathbf{z}_t = \alpha_t \mathbf{z}_{t-1} + \beta_t \cdot \text{noise}

With repeated application (for TT steps), this yields:

zT=t=1Tαtz0+t=1T(βtnoisei=t+1Tαi)\mathbf{z}_T = \prod_{t=1}^T \alpha_t \mathbf{z}_0 + \sum_{t=1}^T \left(\beta_t \cdot \text{noise} \prod_{i=t+1}^T \alpha_i \right)

where z0\mathbf{z}_0 denotes the original input, and αt\alpha_t, βt\beta_t are schedule parameters controlling the retaining of original information and the injection of noise at each step.

In the reverse process, a parameterized denoising network predicts a normal distribution over the previous step:

pθ(zt1zt)=N(zt1;μθ(zt,t),Σθ(zt,t))p_{\theta}(\mathbf{z}_{t-1} \mid \mathbf{z}_t) = \mathcal{N}(\mathbf{z}_{t-1}; \mu_{\theta}(\mathbf{z}_t, t), \Sigma_{\theta}(\mathbf{z}_t, t))

This denoises the sequence embedding iteratively, culminating in a recovered sequence representation that is then mapped back to discrete items using a learned rounding function.

2.3 Confidence-Guided Position Selection

SimDiffRec utilizes the model's own denoising confidence to select augmentation positions. For each position ii in the sequence, the denoising model outputs logits:

zi=Whiz_i = W \cdot h_i

yielding a softmax probability for each candidate item:

pij=exp(zij)k=1Vexp(zik)p_{ij} = \frac{\exp(z_{ij})}{\sum_{k=1}^{|V|} \exp(z_{ik})}

with V|V| denoting the vocabulary size and hih_i the hidden representation at position ii.

The confidence score is computed as

ci=maxjVpijc_i = \max_{j \in V} p_{ij}

Positions with the highest confidence values are chosen for augmentation, ensuring that augmentation occurs where the model's understanding (as reflected by high restoration probability) is robust, thereby mitigating the risk of introducing misleading or noisy augmentations.

2.4 Loss Objectives

The overall training objective integrates three loss components:

  • Sequential Recommendation Loss (LsrL_{\text{sr}}): Typically a standard next-item prediction loss.
  • Contrastive Loss (LclL_{\text{cl}}): Employing the InfoNCE formulation with hard negative sampling,

Lcl=logexp(sim(eu,ev+)/τ)exp(sim(eu,ev+)/τ)+exp(sim(eu,ev)/τ)+v~Nbexp(sim(eu,ev~)/τ)L_{cl} = -\log \frac{\exp(\text{sim}(\mathbf{e}_u, \mathbf{e}_{v^+}) / \tau)}{\exp(\text{sim}(\mathbf{e}_u, \mathbf{e}_{v^+}) / \tau) + \exp(\text{sim}(\mathbf{e}_u, \mathbf{e}_{v^-}) / \tau) + \sum_{\tilde{v} \in \mathcal{N}_b} \exp(\text{sim}(\mathbf{e}_u, \mathbf{e}_{\tilde{v}}) / \tau)}

where Nb\mathcal{N}_b contains hard negatives selected by similarity to the positive samples.

  • Diffusion Loss (LdL_d): Measures the deviation between the original and recovered (denoised) sequences:

Ld=t=2Tz0fθ(zt,t)2+efθ(z1,1)2logpθ(sz0)L_{d} = \sum_{t=2}^{T} \|\mathbf{z}_0 - f_\theta(\mathbf{z}_t, t)\|^2 + \|\mathbf{e} - f_\theta(\mathbf{z}_1, 1)\|^2 - \log p_\theta(s \mid \mathbf{z}_0)

The total loss is given by:

LTotal=Lsr+αLcl+βLdL_{\text{Total}} = L_{\text{sr}} + \alpha L_{\text{cl}} + \beta L_{d}

where α\alpha and β\beta are weighting parameters.

3. Contrastive Learning with Discriminative Sample Generation

The similarity-guided diffusion mechanism generates augmented sequences that retain contextual accuracy and structural coherence, resulting in higher-quality positive pairs for contrastive learning. Hard negative sampling is carried out by drawing negatives that are semantically similar in the embedding space, based on the denoised output's item probability distribution. This practice encourages the model to distinguish between subtle variations in user behavior and to learn more granular representations.

The empirical evidence in the paper (Choi et al., 16 Jul 2025) demonstrates that this strategy yields state-of-the-art results in HR@K and NDCG@K across multiple standard datasets.

4. Practical Considerations and Implementation Details

4.1 Model Architecture

SimDiffRec employs a Transformer-based architecture with two attention layers for the recommendation encoder and a one-layer Transformer for the diffusion model. Item and sequence embedding dimensions are harmonized, and optimization is performed with Adam (learning rate 0.001), batch size 256, and maximum sequence lengths adapted for the dataset in use (typically 50–200).

Augmentation parameters—the number of similar items knoisek_{\text{noise}} used for noise construction, and the ranking index ksamplek_{\text{sample}} for negative sampling—are tuned to accommodate item sparsity across domains.

4.2 Efficiency and Scalability

The deterministic nature of the similarity-guided forward diffusion reduces stochastic variance and improves reproducibility when generating augmentations. Confidence-based augmentation ensures computational focus on regions of the sequence most beneficial for training, enhancing sample efficiency. Ablation studies suggest that all submodules—semantic similarity-based noise, confidence-based position selection, and hard negative sampling—contribute materially to performance.

5. Empirical Evaluation

5.1 Datasets and Evaluation Protocol

Experiments are conducted on five public datasets: Amazon Beauty, Toys, Sports, Yelp, and MovieLens-1M. Metrics include Hit Ratio (HR@k) and Normalized Discounted Cumulative Gain (NDCG@k). The evaluation protocol is leave-one-out, with the last user interaction reserved for testing.

5.2 Results

SimDiffRec consistently outperforms nine baseline models, representing classical (BERT4Rec, SASRec), contrastive learning (CL4SRec, DuoRec, MCLRec, ECL-SR), and diffusion-based (DiffuASR, DreamRec, CaDiRec) recommendation frameworks. Relative improvements over the next-best, such as CaDiRec, range from 4.16% to 13.36% on HR@K and NDCG@K, depending on the dataset.

Ablation studies show that removing any of the core similarity-guided mechanisms leads to a notable drop in performance, highlighting their necessity.

Further analysis reveals that optimal settings for knoisek_{\text{noise}} vary with domain sparsity; for low-sparsity domains, 25–50 is optimal, while high-sparsity domains favor lower values (1–5). Sensitivity to loss weighting parameters is dataset-dependent, suggesting that small-scale hyperparameter tuning is essential for deployment.

Visualization using t-SNE confirms that confidence-based augmentation produces tighter, more coherent clusters in the learned embedding space, affirming semantic consistency and improved representation structure.

6. Broader Context and Future Directions

Similarity-Guided Diffusion for Contrastive Sequential Recommendation establishes a link between item similarity-driven augmentation and robust contrastive representation learning. By directly integrating structure from the item embedding space and leveraging model confidence, this approach overcomes key weaknesses in random augmentation and enhances the fidelity of learned user preferences.

The potential applicability of this framework extends to other recommendation scenarios, including session-based recommendation and contexts where side information or multi-modal data is available, provided suitable similarity metrics are devised for noise generation. Further exploration into adaptive and context-dependent similarity computation and augmentation position selection, as well as integration with explicit intent or semantic retrieval signals, are plausible directions arising from this work.

SimDiffRec (Choi et al., 16 Jul 2025) distinguishes itself from earlier random-augmentation-based diffusion or contrastive methods (Li et al., 2023, Du et al., 2023, Liu et al., 2023) by explicitly aligning noise and augmentation strategy with semantic relationships derived from embeddings. This reduces the risk of context loss and improves representation learning, as evidenced by significant empirical gains. Related contrastive frameworks relying on global graph-based augmentation (Zhang et al., 2022, Wang et al., 2022), explanation-guided augmentation (Wang et al., 2022), or latent intent-aware guidance (Qu et al., 22 Apr 2025) can inform the continued refinement of similarity-guided diffusion models.

In summary, similarity-guided diffusion for contrastive sequential recommendation provides an effective, semantically coherent augmentation strategy for SR models, leveraging both item similarity and confidence-driven guidance to improve both training stability and recommendation quality.