Generative Recommendation Systems

Updated 15 May 2026

Generative Recommendation Systems are a new paradigm that treats recommendations as conditional generation tasks, using LLMs, Transformer architectures, and diffusion models to produce novel outputs beyond fixed candidate sets.
They employ advanced tokenization methods—ID-based, text-based, and codebook-based—to encode semantic and collaborative signals, enabling robust content understanding and personalized suggestions.
Innovative training strategies like next-token prediction, reinforcement learning, and structured decoding drive multimodal integration, efficient retrieval, and explainable recommendations in large-scale deployments.

Generative Recommendation Systems (GRSs) reconceptualize recommendation as a conditional generation problem, leveraging generative models—particularly LLMs, large recommendation models, and diffusion models—to generate recommended items or content directly rather than ranking a set of candidates. This paradigm employs unified Transformer-based architectures and follows explicit scaling laws, supporting end-to-end modeling, multimodal integration, and reasoning capabilities that go beyond the scope of classic discriminative recommenders (Hou et al., 31 Oct 2025, Wang et al., 19 Feb 2025).

1. Generative Paradigm versus Traditional Approaches

Traditional recommendation systems typically estimate user–item affinity using discriminative scoring functions and select top-ranked items from a fixed candidate set. In contrast, GRSs model the conditional probability of recommendation sequences or outputs, generating identifiers or content by sampling or beam-searching over the vast generative space:

$P(y\mid x)\quad\text{(generative)},$

$f(u,i)\quad\text{(traditional discriminative scoring)}.$

GRSs do not assume a closed candidate set and can produce novel or highly personalized recommendations by exploiting model-internal knowledge and learned world semantics (Hou et al., 31 Oct 2025). This shift enables supporting conversational, creative, and explainable tasks difficult for discriminative pipelines.

2. Data Foundations and Tokenization

The transition to generative modeling places novel requirements on data representation, particularly item and user tokenization. In GRSs, item identifiers (tokens) must encode semantic and collaborative signals, enabling both content understanding and behavior modeling. Approaches include:

ID-based tokenization: Uniquely indexes items without semantics; fails under cold start (Wang et al., 2024).
Text-based tokenization: Uses item titles/descriptions; contains rich semantics but incurs length bias and lacks collaborative differentiation (Wang et al., 2024).
Codebook-based tokenization: Employs vector-quantized (e.g. RQ-VAE) representations to discretize item embeddings into short semantic ID sequences, supporting hierarchical, content-aware, and collaborative-aware tokenization (Liu et al., 29 Sep 2025, Zhang et al., 19 Nov 2025, Wang et al., 2024).

The LETTER tokenizer (Wang et al., 2024) exemplifies a learnable codebook system, integrating semantic regularization, collaborative contrastive loss, and diversity regularization to enable robust, generation-friendly identifiers. Modern GRSs also address code-assignment bias, cross-modal (multimodal) tokenization (Zhang et al., 19 Nov 2025, Zhu et al., 30 Mar 2025), and collision mitigation.

3. Generative Architectures and Training Methodologies

GRSs predominantly utilize large Transformer architectures, including both decoder-only (causal LLMs) and encoder–decoder models. Systemically, these models ingest user histories and context—expressed via sequences of item or action tokens—and decode target item sequences or attributes in an autoregressive fashion (Yang et al., 9 Jul 2025, Zhang et al., 19 Nov 2025, Liu et al., 29 Sep 2025).

Feature highlights include:

Next-token prediction objectives: Maximize the conditional log-likelihood of target item codes, with extensions to page-wise or session-wise grouping for denser supervision and reduced label ambiguity (Zou et al., 16 Apr 2026, Liang et al., 16 Aug 2025).
Preference optimization: Reinforcement learning methods (e.g., GRPO-SR (Zou et al., 16 Apr 2026), PARS and MSRA (Xing et al., 21 Aug 2025), DPO, listwise direct preference optimization (Fu et al., 9 Feb 2026)) align generative policies with nuanced multi-level feedback and business metrics.
Reflection and correction: Structured decoding strategies with reflection–correction loops (e.g., GRC (Xing et al., 27 Feb 2026)) counteract exposure bias and enable trajectory repair.
Multimodal modeling: Cross-modal quantization (MACRec (Zhang et al., 19 Nov 2025)), contrastive alignment, and late/early fusion (Zhu et al., 30 Mar 2025), underpin multi-view sequence generation for diverse item content.

Training leverages supervised fine-tuning (MLE), contrastive/InfoNCE objectives for negative mining, reinforcement signals for value alignment, and flow-matching via generative flow networks (GFlowGR (Wang et al., 19 Jun 2025)) to mitigate exposure bias by exploring plausible positives never seen in logs.

4. Retrieval, Inference, and System Engineering

GRS inference faces operational and system-level challenges arising from the generative search space and the need for scalable deployment:

Efficient retrieval: Hybrid architectures (e.g., RankGR (Fu et al., 9 Feb 2026)) decompose retrieval into initial assessment (coarse scoring via next-token prediction, possibly listwise), followed by refined scoring through deep candidate–context interaction.
Constrained decoding: Decoding is often restricted to valid codepaths by Trie-based approaches to ensure only real-world items are generated (Wang et al., 2024).
Optimization on hardware: Systems such as TurboGR (Chai et al., 13 May 2026) address "jagged" data structures, dynamic load-balancing, high-throughput negative sampling, and NPU/GPU utilization, supporting model/distributed training at 0.2B+ parameters with near-linear scalability.
Cold-start adaptation: Model editing approaches (GenRecEdit (Shen et al., 15 Mar 2026)) patch next-token generation for cold items by position-wise editing in Transformer FFNs, circumventing costly retraining.

Baseline industrial deployments (e.g., TencentGR-10M (Pan et al., 4 Apr 2026), JD App (Zou et al., 16 Apr 2026), Taobao (Liang et al., 16 Aug 2025, Fu et al., 9 Feb 2026)) employ scalable inference through approximate nearest neighbor (ANN) vector search, hierarchical sparse parallelism, and asynchronous communication, sustaining real-time throughput at ~10,000 QPS.

5. Reasoning, Multimodality, and Task Diversity

GRSs have extended the expressivity of recommendation toward reasoning, explainability, and task generality:

Reasoning architectures: REG4Rec (Xing et al., 21 Aug 2025) introduces MoE-based parallel quantization, diversified reasoning path exploration, and consistency-oriented self-reflection for high-confidence, robust recommendations.
Multimodal fusion: Strong empirical evidence (e.g., MGR-LF++ (Zhu et al., 30 Mar 2025), MACRec (Zhang et al., 19 Nov 2025)) shows >20% improvement when leveraging cross-modal tokenization and alignment, using contrastive objectives and special modality-marking tokens to preserve separability during generation.
Task diversity: GRSs encompass slates, ranked lists, conversational dialogs, and even creative item or image generation (GEMRec (Guo et al., 2023)). The two-stage prompt-model retrieval and generated-item ranking enables personalization amid “infinite” generative possibilities.

Evaluation is multi-faceted, encompassing recall, NDCG, diversity, hallucination rates, and preference alignment, as well as online metrics such as click-through and conversion.

6. Scaling Laws, Model Bottlenecks, and Foundation Models

Empirical studies (Wang et al., 19 Feb 2025, Liu et al., 29 Sep 2025, Hou et al., 31 Oct 2025) elucidate the scaling behavior of GRSs:

Scaling laws: Performance (cross-entropy loss, recall@K) improves sublinearly with log(model capacity) and log(training data), e.g.,

$L(N) \simeq L_\infty + a N^{-\alpha}$

with typical exponents $\alpha\in[0.05,0.1]$ , and recall@K rising logarithmically.

SID bottleneck: SID-based GR architectures saturate early in scaling due to limited code capacity for semantic information, regardless of encoder or quantizer size (Liu et al., 29 Sep 2025).
LLM-based GR: Direct fine-tuning of large decoder-only LLMs to generate item textual identifiers (“LLM-as-RS”) surpasses SID-based scaling limits, capturing both content understanding and collaborative filtering without explicit tokenization, but at increased inference cost (Liu et al., 29 Sep 2025, Yang et al., 9 Jul 2025).
Hybrid and foundation directions: Ongoing work explores learnable tokenizers (LETTER (Wang et al., 2024)), joint code-continuous representations, and large unified backbones for multi-task, multi-modal recommendation (Hou et al., 31 Oct 2025).

7. Challenges, Practical Considerations, and Future Directions

Current research surfaces several open challenges:

Challenge	Context	Example Approaches
Data quality & diversity	Need for scalable, high-quality logs and multi-modal coverage	Data distillation, augmentation
Robustness & fairness	Bias, cold start, and adversarial perturbations	Model editing, fairness-aware
Computation efficiency	Training/inference at hundred-billion parameter scale	TurboGR, MoE, quantization
Evaluation methodology	Lack of large, realistic, multi-turn and multi-modal benchmarks	TencentGR datasets

Scaling, continual adaptation, interpretability of generation, and robust human-in-the-loop alignment remain at the research frontier (Hou et al., 31 Oct 2025, Yang et al., 9 Jul 2025, Wang et al., 19 Feb 2025).

Practical deployments (JD App (Zou et al., 16 Apr 2026), Tencent Ads (Pan et al., 4 Apr 2026), Taobao (Liang et al., 16 Aug 2025, Fu et al., 9 Feb 2026)) and shared open-source benchmarks have established public testbeds for continued advances, while the best practice is to combine LLM-powered reasoning, learnable tokenization, business-specific reward modeling, and hardware-software co-design.

Key references:

(Hou et al., 31 Oct 2025) A Survey on Generative Recommendation: Data, Model, and Tasks
(Wang et al., 19 Feb 2025) Generative Large Recommendation Models: Emerging Trends in LLMs for Recommendation
(Liu et al., 29 Sep 2025) Understanding Generative Recommendation with Semantic IDs from a Model-scaling View
(Wang et al., 2024) Learnable Item Tokenization for Generative Recommendation
(Zhang et al., 19 Nov 2025) Multi-Aspect Cross-modal Quantization for Generative Recommendation
(Pan et al., 4 Apr 2026) Tencent Advertising Algorithm Challenge 2025: All-Modality Generative Recommendation
(Xing et al., 21 Aug 2025) REG4Rec: Reasoning-Enhanced Generative Model for Large-Scale Recommendation Systems
(Zou et al., 16 Apr 2026) GenRec: A Preference-Oriented Generative Framework for Large-Scale Recommendation
(Chai et al., 13 May 2026) TurboGR: An Accelerated Training System for Large-Scale Generative Recommendation