LLaDA-Rec: Diffusion-Based Generative Recommendation
- LLaDA-Rec is a discrete diffusion-based framework for generative recommendation, producing semantic IDs in parallel with bidirectional attention.
- It overcomes autoregressive limitations by employing iterative masked token generation and dual masking objectives to better model intra-item and inter-item dependencies.
- Empirical evaluations on real-world datasets show significant improvements in Recall@5 and NDCG, validated by detailed ablation studies.
LLaDA-Rec is a discrete diffusion-based generative recommendation framework that advances next-item prediction by generating semantic ID (SID) sequences in parallel with bidirectional attention. Traditional semantic-ID recommenders are constrained by left-to-right (autoregressive) decoding, which impedes global semantic modeling and accumulates errors through fixed generation order. LLaDA-Rec circumvents these limitations by recasting recommendation as an iterative, masked parallel generation task using a discrete diffusion process, supported by a tailored parallel tokenization scheme, dual masking objectives, and an inference-stage adaptive beam search. These innovations enable LLaDA-Rec to model both intra-item and inter-item dependencies and yield improved performance across several real-world datasets.
1. Problem Formulation and Limitations of Autoregressive Decoders
Generative recommendation predicts the next item a user will interact with by generating its semantic ID, , a length- discrete token sequence, where each is drawn from a shared vocabulary of size . Given a user history of items, represented as an ordered SID sequence , the task is to model
Prior methods—e.g., TIGER and LC-Rec—use autoregressive decoding, factorizing the conditional probability:
Autoregression causes two fundamental problems:
- Unidirectional Constraints: Tokens can only attend to their precursors, hindering modeling of global coherence within the SID.
- Error Accumulation: Errors in early token predictions propagate, elevating risk for subsequent token mispredictions.
2. Discrete Diffusion Framework for Parallel SID Generation
LLaDA-Rec replaces autoregressive decoding with a discrete diffusion process that operates in parallel and is bidirectional. Generation proceeds in steps, starting from a fully masked SID, iteratively reconstructing masked tokens.
Forward (Noise) Process
A family of masking operators parameterized by mask ratio is defined:
With increasing , the original token sequence is progressively masked until all positions are hidden.
Reverse (Denoising) Process
A single Mask Predictor, , reconstructs clean tokens from partially masked input. For each diffusion step , a ratio is selected, and is trained to recover given and the context:
Masking-based Training Objectives
LLaDA-Rec employs two complementary losses:
- User-history Masking Loss ():
- Next-item Masking Loss ():
The final training objective:
3. Parallel Tokenization and Bidirectional Representation Learning
SID generation benefits from parallel and order-agnostic tokenization to leverage the bidirectional model. Standard Residual Quantization VAE (RQ-VAE) hierarchically orders tokens, which is misaligned with bidirectional Transformers. LLaDA-Rec addresses this with a Multi-Head Vector Quantized VAE (Multi-Head VQ-VAE):
- Each item embedding (e.g., from Sentence-T5) is encoded by an MLP to , split into equal sub-vectors .
- Each sub-vector is quantized by nearest neighbor search in codebook to yield token .
- The tokens form the SID , with no cross-token ordering bias.
- Reconstruction proceeds by concatenating token embeddings and decoding with an MLP.
This approach achieves parallelism in ID construction, matching the bidirectional architecture and facilitating global semantic modeling.
4. Dual Masking Regimens in Model Training
Every training example (user history plus next item) is subjected to two simultaneous masking regimens:
- History-level Masking: A random ratio is sampled and applied to all tokens in , driving .
- Next-item-level Masking: An independent ratio masks tokens in for .
The model processes the concatenated as input and is optimized to predict the true tokens for every [MASK] position. This strategy enhances the model's capacity for both intra-item and inter-item dependency modeling.
5. Inference: Discrete Diffusion Decoding and Adaptive Beam Search
Standard discrete diffusion yields only one sample, and classic beam search presumes left-to-right generation. To generate top- SID candidates, LLaDA-Rec introduces adaptive-order beam search:
- Initialization: All positions are masked; maintain a beam set of size .
- At Each Step ():
- For each unfilled position , compute over .
- Select top positions with highest max-probabilities into .
- Expand the beam over positions by scoring top- tokens at each chosen position, iteratively pruned by joint probabilities.
- Update each sequence in with chosen tokens; re-mask unfilled positions for the next iteration.
After steps, all beams are completed SIDs. The final top- recommendations are selected by their joint model probability.
6. Empirical Benchmarking and Ablation
LLaDA-Rec has been evaluated on three Amazon 2023 subsets—Scientific, Instrument, and Game—spanning tens of thousands of users and items, with average session lengths 8.1–8.9. Metrics include Recall@1,5,10 and NDCG@5,10, against a spectrum of ID-based, diffusion-based, and semantic-ID generative recommender baselines:
- ID-based: GRU4Rec, SASRec, BERT4Rec, FMLP-Rec, LRURec
- Diffusion-based: DreamRec, DiffuRec
- Semantic-ID generative: VQ-Rec, TIGER / TIGER-SAS, LETTER, LC-Rec (RQ-VAE), RPG
Across all datasets and metrics, LLaDA-Rec achieves either the best or statistically indistinguishable top performance, with +2–4 point absolute gains in Recall@5 over prior semantic-ID methods and similar lifts in NDCG.
Ablation studies show clear dependencies:
- Substituting Multi-Head VQ-VAE with RQ-VAE or orthogonal quantization drops performance by 5–15%.
- Removing either masking loss (, ) reduces Recall@5 by 10–20%.
- Omitting beam search in favor of greedy sampling reduces all metrics to 1–2%, as only one candidate is yielded.
7. Theoretical Implications, Complexity, and Future Directions
Discrete diffusion unifies generative modeling and retrieval by directly outputting semantic IDs, obviating the need for latent matching with item embedding tables. The use of bidirectional attention and adaptive generation order counters the primary drawbacks of autoregressive decoders.
Inference requires Transformer passes (each self-attention) and per-position beam pruning. Empirically, and strike a favorable balance.
Identified limitations and areas for further work include:
- Fixed-length SIDs: Extending the method to variable-length SIDs would generalize its applicability.
- Decoding Efficiency: Adapting distillation or one-step denoising LLM approaches may accelerate inference.
- Enhanced Personalization: Integrating richer user profile information, textual or graph-based, into the diffusion process offers a promising extension.
In summary, LLaDA-Rec demonstrates that discrete diffusion, paired with parallel tokenization, dual masking, and adaptive-order beam search, constitutes an effective and robust framework for semantic-ID–based generative recommendation, overcoming key challenges faced by prior autoregressive approaches.