Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 142 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 59 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 430 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

LLaDA-Rec: Diffusion-Based Generative Recommendation

Updated 11 November 2025
  • LLaDA-Rec is a discrete diffusion-based framework for generative recommendation, producing semantic IDs in parallel with bidirectional attention.
  • It overcomes autoregressive limitations by employing iterative masked token generation and dual masking objectives to better model intra-item and inter-item dependencies.
  • Empirical evaluations on real-world datasets show significant improvements in Recall@5 and NDCG, validated by detailed ablation studies.

LLaDA-Rec is a discrete diffusion-based generative recommendation framework that advances next-item prediction by generating semantic ID (SID) sequences in parallel with bidirectional attention. Traditional semantic-ID recommenders are constrained by left-to-right (autoregressive) decoding, which impedes global semantic modeling and accumulates errors through fixed generation order. LLaDA-Rec circumvents these limitations by recasting recommendation as an iterative, masked parallel generation task using a discrete diffusion process, supported by a tailored parallel tokenization scheme, dual masking objectives, and an inference-stage adaptive beam search. These innovations enable LLaDA-Rec to model both intra-item and inter-item dependencies and yield improved performance across several real-world datasets.

1. Problem Formulation and Limitations of Autoregressive Decoders

Generative recommendation predicts the next item a user will interact with by generating its semantic ID, sn=[cn,1,cn,2,,cn,M]s_n = [c_{n,1}, c_{n,2}, \ldots, c_{n,M}], a length-MM discrete token sequence, where each cn,mc_{n,m} is drawn from a shared vocabulary of size W|\mathcal{W}|. Given a user history of n1n-1 items, represented as an ordered SID sequence SH=[c1,1,,cn1,M]\mathcal{S}_H = [c_{1,1}, \ldots, c_{n-1,M}], the task is to model

θ=argmaxθPθ(snSH)\theta^* = \arg\max_\theta P_\theta(s_n \mid \mathcal{S}_H)

Prior methods—e.g., TIGER and LC-Rec—use autoregressive decoding, factorizing the conditional probability:

Pθ(snSH)=m=1MPθ(cn,mcn,<m,SH)P_\theta(s_n \mid \mathcal{S}_H) = \prod_{m=1}^M P_\theta(c_{n,m} \mid c_{n,<m}, \mathcal{S}_H)

Autoregression causes two fundamental problems:

  • Unidirectional Constraints: Tokens can only attend to their precursors, hindering modeling of global coherence within the SID.
  • Error Accumulation: Errors in early token predictions propagate, elevating risk for subsequent token mispredictions.

2. Discrete Diffusion Framework for Parallel SID Generation

LLaDA-Rec replaces autoregressive decoding with a discrete diffusion process that operates in parallel and is bidirectional. Generation proceeds in TT steps, starting from a fully masked SID, iteratively reconstructing masked tokens.

Forward (Noise) Process

A family of masking operators {qr}\{q_r\} parameterized by mask ratio r[0,1]r \in [0,1] is defined:

qr(zrz0)=i[(1r)δ(zir=zi0)+rδ(zir=[MASK])]q_r(z^r \mid z^0) = \prod_{i} \left[ (1-r) \delta(z^r_i = z^0_i) + r \delta(z^r_i = [\mathrm{MASK}]) \right]

With increasing rr, the original token sequence is progressively masked until all positions are hidden.

Reverse (Denoising) Process

A single Mask Predictor, pθp_\theta, reconstructs clean tokens from partially masked input. For each diffusion step tt, a ratio rtr_t is selected, and pθp_\theta is trained to recover z0z^0 given zrtz^{r_t} and the context:

pθ(z0zrt,context)q(z0zrt)p_\theta(z^0 \mid z^{r_t}, \text{context}) \approx q(z^0 \mid z^{r_t})

Masking-based Training Objectives

LLaDA-Rec employs two complementary losses:

  • User-history Masking Loss (LHisMask\mathcal{L}_{\mathrm{His-Mask}}):

LHisMask=Er,SH[1ri:SH,ir=[MASK]logPθ(SH,iSHr)]\mathcal{L}_{\mathrm{His-Mask}} = -\mathbb{E}_{r, \mathcal{S}_H} \left[ \frac{1}{r} \sum_{i: \mathcal{S}^r_{H,i} = [\mathrm{MASK}]} \log P_\theta(\mathcal{S}_{H,i} \mid \mathcal{S}^r_H) \right]

  • Next-item Masking Loss (LItemMask\mathcal{L}_{\mathrm{Item-Mask}}):

LItemMask=Er,sn[1rm:cn,mr=[MASK]logPθ(cn,msnr,SH)]\mathcal{L}_{\mathrm{Item-Mask}} = -\mathbb{E}_{r, s_n} \left[ \frac{1}{r} \sum_{m: c_{n,m}^r = [\mathrm{MASK}]} \log P_\theta(c_{n,m} \mid s_n^r, \mathcal{S}_H) \right]

The final training objective:

LTotal=LItemMask+λHisLHisMask+λRegθ22\mathcal{L}_{\mathrm{Total}} = \mathcal{L}_{\mathrm{Item-Mask}} + \lambda_{\mathrm{His}} \mathcal{L}_{\mathrm{His-Mask}} + \lambda_{\mathrm{Reg}} \|\theta\|_2^2

3. Parallel Tokenization and Bidirectional Representation Learning

SID generation benefits from parallel and order-agnostic tokenization to leverage the bidirectional model. Standard Residual Quantization VAE (RQ-VAE) hierarchically orders tokens, which is misaligned with bidirectional Transformers. LLaDA-Rec addresses this with a Multi-Head Vector Quantized VAE (Multi-Head VQ-VAE):

  • Each item embedding viv_i (e.g., from Sentence-T5) is encoded by an MLP to ziRdz_i \in \mathbb{R}^d, split into MM equal sub-vectors [zi,1;;zi,M][z_{i,1}; \dots; z_{i,M}].
  • Each sub-vector zi,mz_{i,m} is quantized by nearest neighbor search in codebook Cm\mathcal{C}_m to yield token ci,mc_{i,m}.
  • The MM tokens form the SID si=[ci,1,,ci,M]s_i = [c_{i,1}, \ldots, c_{i,M}], with no cross-token ordering bias.
  • Reconstruction proceeds by concatenating token embeddings and decoding with an MLP.

This approach achieves parallelism in ID construction, matching the bidirectional architecture and facilitating global semantic modeling.

4. Dual Masking Regimens in Model Training

Every training example (user history plus next item) is subjected to two simultaneous masking regimens:

  • History-level Masking: A random ratio rHr_H is sampled and applied to all tokens in SH\mathcal{S}_H, driving LHisMask\mathcal{L}_{\mathrm{His-Mask}}.
  • Next-item-level Masking: An independent ratio rIr_I masks tokens in sns_n for LItemMask\mathcal{L}_{\mathrm{Item-Mask}}.

The model processes the concatenated [SHrH;snrI][\mathcal{S}_H^{r_H}; s_n^{r_I}] as input and is optimized to predict the true tokens for every [MASK] position. This strategy enhances the model's capacity for both intra-item and inter-item dependency modeling.

Standard discrete diffusion yields only one sample, and classic beam search presumes left-to-right generation. To generate top-kk SID candidates, LLaDA-Rec introduces adaptive-order beam search:

  • Initialization: All positions are masked; maintain a beam set B1\mathcal{B}_1 of size BB.
  • At Each Step (tt):
  1. For each unfilled position mPGtm \notin PG_t, compute Pθt,m(wsnt,SH)P_\theta^{t,m}(w \mid s_n^t, \mathcal{S}_H) over ww.
  2. Select top M/TM/T positions with highest max-probabilities into Mt\mathcal{M}_t.
  3. Expand the beam over Mt\mathcal{M}_t positions by scoring top-BB tokens at each chosen position, iteratively pruned by joint probabilities.
  4. Update each sequence in Bt+1\mathcal{B}_{t+1} with chosen tokens; re-mask unfilled positions for the next iteration.

After TT steps, all beams are completed SIDs. The final top-kk recommendations are selected by their joint model probability.

6. Empirical Benchmarking and Ablation

LLaDA-Rec has been evaluated on three Amazon 2023 subsets—Scientific, Instrument, and Game—spanning tens of thousands of users and items, with average session lengths 8.1–8.9. Metrics include Recall@1,5,10 and NDCG@5,10, against a spectrum of ID-based, diffusion-based, and semantic-ID generative recommender baselines:

  • ID-based: GRU4Rec, SASRec, BERT4Rec, FMLP-Rec, LRURec
  • Diffusion-based: DreamRec, DiffuRec
  • Semantic-ID generative: VQ-Rec, TIGER / TIGER-SAS, LETTER, LC-Rec (RQ-VAE), RPG

Across all datasets and metrics, LLaDA-Rec achieves either the best or statistically indistinguishable top performance, with +2–4 point absolute gains in Recall@5 over prior semantic-ID methods and similar lifts in NDCG.

Ablation studies show clear dependencies:

  • Substituting Multi-Head VQ-VAE with RQ-VAE or orthogonal quantization drops performance by 5–15%.
  • Removing either masking loss (LHisMask\mathcal{L}_{\mathrm{His-Mask}}, LItemMask\mathcal{L}_{\mathrm{Item-Mask}}) reduces Recall@5 by 10–20%.
  • Omitting beam search in favor of greedy sampling reduces all metrics to 1–2%, as only one candidate is yielded.

7. Theoretical Implications, Complexity, and Future Directions

Discrete diffusion unifies generative modeling and retrieval by directly outputting semantic IDs, obviating the need for latent matching with item embedding tables. The use of bidirectional attention and adaptive generation order counters the primary drawbacks of autoregressive decoders.

Inference requires TT Transformer passes (each O(M2)O(M^2) self-attention) and O(MTBlogB)O(\frac{M}{T} B \log B) per-position beam pruning. Empirically, TM/2T \approx M/2 and B=16B=16 strike a favorable balance.

Identified limitations and areas for further work include:

  • Fixed-length SIDs: Extending the method to variable-length SIDs would generalize its applicability.
  • Decoding Efficiency: Adapting distillation or one-step denoising LLM approaches may accelerate inference.
  • Enhanced Personalization: Integrating richer user profile information, textual or graph-based, into the diffusion process offers a promising extension.

In summary, LLaDA-Rec demonstrates that discrete diffusion, paired with parallel tokenization, dual masking, and adaptive-order beam search, constitutes an effective and robust framework for semantic-ID–based generative recommendation, overcoming key challenges faced by prior autoregressive approaches.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to LLaDA-Rec.