Papers
Topics
Authors
Recent
Search
2000 character limit reached

Generative Recommendation Systems

Updated 15 May 2026
  • Generative Recommendation Systems are a new paradigm that treats recommendations as conditional generation tasks, using LLMs, Transformer architectures, and diffusion models to produce novel outputs beyond fixed candidate sets.
  • They employ advanced tokenization methods—ID-based, text-based, and codebook-based—to encode semantic and collaborative signals, enabling robust content understanding and personalized suggestions.
  • Innovative training strategies like next-token prediction, reinforcement learning, and structured decoding drive multimodal integration, efficient retrieval, and explainable recommendations in large-scale deployments.

Generative Recommendation Systems (GRSs) reconceptualize recommendation as a conditional generation problem, leveraging generative models—particularly LLMs, large recommendation models, and diffusion models—to generate recommended items or content directly rather than ranking a set of candidates. This paradigm employs unified Transformer-based architectures and follows explicit scaling laws, supporting end-to-end modeling, multimodal integration, and reasoning capabilities that go beyond the scope of classic discriminative recommenders (Hou et al., 31 Oct 2025, Wang et al., 19 Feb 2025).

1. Generative Paradigm versus Traditional Approaches

Traditional recommendation systems typically estimate user–item affinity using discriminative scoring functions and select top-ranked items from a fixed candidate set. In contrast, GRSs model the conditional probability of recommendation sequences or outputs, generating identifiers or content by sampling or beam-searching over the vast generative space:

P(yx)(generative),P(y\mid x)\quad\text{(generative)},

f(u,i)(traditional discriminative scoring).f(u,i)\quad\text{(traditional discriminative scoring)}.

GRSs do not assume a closed candidate set and can produce novel or highly personalized recommendations by exploiting model-internal knowledge and learned world semantics (Hou et al., 31 Oct 2025). This shift enables supporting conversational, creative, and explainable tasks difficult for discriminative pipelines.

2. Data Foundations and Tokenization

The transition to generative modeling places novel requirements on data representation, particularly item and user tokenization. In GRSs, item identifiers (tokens) must encode semantic and collaborative signals, enabling both content understanding and behavior modeling. Approaches include:

The LETTER tokenizer (Wang et al., 2024) exemplifies a learnable codebook system, integrating semantic regularization, collaborative contrastive loss, and diversity regularization to enable robust, generation-friendly identifiers. Modern GRSs also address code-assignment bias, cross-modal (multimodal) tokenization (Zhang et al., 19 Nov 2025, Zhu et al., 30 Mar 2025), and collision mitigation.

3. Generative Architectures and Training Methodologies

GRSs predominantly utilize large Transformer architectures, including both decoder-only (causal LLMs) and encoder–decoder models. Systemically, these models ingest user histories and context—expressed via sequences of item or action tokens—and decode target item sequences or attributes in an autoregressive fashion (Yang et al., 9 Jul 2025, Zhang et al., 19 Nov 2025, Liu et al., 29 Sep 2025).

Feature highlights include:

Training leverages supervised fine-tuning (MLE), contrastive/InfoNCE objectives for negative mining, reinforcement signals for value alignment, and flow-matching via generative flow networks (GFlowGR (Wang et al., 19 Jun 2025)) to mitigate exposure bias by exploring plausible positives never seen in logs.

4. Retrieval, Inference, and System Engineering

GRS inference faces operational and system-level challenges arising from the generative search space and the need for scalable deployment:

  • Efficient retrieval: Hybrid architectures (e.g., RankGR (Fu et al., 9 Feb 2026)) decompose retrieval into initial assessment (coarse scoring via next-token prediction, possibly listwise), followed by refined scoring through deep candidate–context interaction.
  • Constrained decoding: Decoding is often restricted to valid codepaths by Trie-based approaches to ensure only real-world items are generated (Wang et al., 2024).
  • Optimization on hardware: Systems such as TurboGR (Chai et al., 13 May 2026) address "jagged" data structures, dynamic load-balancing, high-throughput negative sampling, and NPU/GPU utilization, supporting model/distributed training at 0.2B+ parameters with near-linear scalability.
  • Cold-start adaptation: Model editing approaches (GenRecEdit (Shen et al., 15 Mar 2026)) patch next-token generation for cold items by position-wise editing in Transformer FFNs, circumventing costly retraining.

Baseline industrial deployments (e.g., TencentGR-10M (Pan et al., 4 Apr 2026), JD App (Zou et al., 16 Apr 2026), Taobao (Liang et al., 16 Aug 2025, Fu et al., 9 Feb 2026)) employ scalable inference through approximate nearest neighbor (ANN) vector search, hierarchical sparse parallelism, and asynchronous communication, sustaining real-time throughput at ~10,000 QPS.

5. Reasoning, Multimodality, and Task Diversity

GRSs have extended the expressivity of recommendation toward reasoning, explainability, and task generality:

  • Reasoning architectures: REG4Rec (Xing et al., 21 Aug 2025) introduces MoE-based parallel quantization, diversified reasoning path exploration, and consistency-oriented self-reflection for high-confidence, robust recommendations.
  • Multimodal fusion: Strong empirical evidence (e.g., MGR-LF++ (Zhu et al., 30 Mar 2025), MACRec (Zhang et al., 19 Nov 2025)) shows >20% improvement when leveraging cross-modal tokenization and alignment, using contrastive objectives and special modality-marking tokens to preserve separability during generation.
  • Task diversity: GRSs encompass slates, ranked lists, conversational dialogs, and even creative item or image generation (GEMRec (Guo et al., 2023)). The two-stage prompt-model retrieval and generated-item ranking enables personalization amid “infinite” generative possibilities.

Evaluation is multi-faceted, encompassing recall, NDCG, diversity, hallucination rates, and preference alignment, as well as online metrics such as click-through and conversion.

6. Scaling Laws, Model Bottlenecks, and Foundation Models

Empirical studies (Wang et al., 19 Feb 2025, Liu et al., 29 Sep 2025, Hou et al., 31 Oct 2025) elucidate the scaling behavior of GRSs:

  • Scaling laws: Performance (cross-entropy loss, recall@K) improves sublinearly with log(model capacity) and log(training data), e.g.,

L(N)L+aNαL(N) \simeq L_\infty + a N^{-\alpha}

with typical exponents α[0.05,0.1]\alpha\in[0.05,0.1], and recall@K rising logarithmically.

7. Challenges, Practical Considerations, and Future Directions

Current research surfaces several open challenges:

Challenge Context Example Approaches
Data quality & diversity Need for scalable, high-quality logs and multi-modal coverage Data distillation, augmentation
Robustness & fairness Bias, cold start, and adversarial perturbations Model editing, fairness-aware
Computation efficiency Training/inference at hundred-billion parameter scale TurboGR, MoE, quantization
Evaluation methodology Lack of large, realistic, multi-turn and multi-modal benchmarks TencentGR datasets

Scaling, continual adaptation, interpretability of generation, and robust human-in-the-loop alignment remain at the research frontier (Hou et al., 31 Oct 2025, Yang et al., 9 Jul 2025, Wang et al., 19 Feb 2025).

Practical deployments (JD App (Zou et al., 16 Apr 2026), Tencent Ads (Pan et al., 4 Apr 2026), Taobao (Liang et al., 16 Aug 2025, Fu et al., 9 Feb 2026)) and shared open-source benchmarks have established public testbeds for continued advances, while the best practice is to combine LLM-powered reasoning, learnable tokenization, business-specific reward modeling, and hardware-software co-design.


Key references:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Generative Recommendation Systems (GRSs).