Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 142 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 59 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 430 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Generative Recommendation Paradigm

Updated 13 November 2025
  • Generative recommendation is a paradigm that models user-item interactions as a conditional generation task, using autoregressive models to output sequential recommendations.
  • Data augmentation with knowledge-infused and multi-modal strategies enriches training signals and enhances personalization and interpretability.
  • Unified architectures that integrate discrete tokenization and large language models yield scalable, robust, and performance-boosting recommendation systems.

Generative recommendation denotes a paradigm in recommender systems where the task of matching users to items is formulated as a conditional generation problem. Rather than relying on discriminative scoring or ranking functions, generative recommendation employs models—often LLMs or diffusion mechanisms—to directly output a sequence or structure representing recommended items or even newly synthesized content. This shift enables models to leverage world knowledge, multi-modal semantics, and reasoning abilities, offering new capabilities in personalization, content creation, and interpretability across domains such as e-commerce, news, social media, and creative platforms.

1. Conceptual Foundations and Paradigm Shift

Traditional (discriminative) recommenders estimate preference scores f(u,i)P(yui=1u,i)f(u, i) \approx P(y_{ui}=1 | u, i) and select top-KK items purely via ranking (Hou et al., 31 Oct 2025). In contrast, generative recommenders seek to model the full conditional distribution over recommendations: P(yx)=tP(yty<t,x)P(y|x) = \prod_t P(y_t|y_{<t}, x), where xx embeds user context, history, or preferences, and the output yy is a sequence of item identifiers, tokens, or content representations.

This paradigm shift is driven by several factors:

  • Expressivity: Generative models can synthesize recommendations well beyond observed training data, generating new items or explanations.
  • World knowledge and reasoning: Pretrained generative models (e.g. LLMs, multimodal transformers) natively encode background knowledge, enabling more nuanced recommendations.
  • Unified task and modeling space: Generative frameworks recast diverse tasks (search, recommendation, explanation, conversation, and personalized item generation) into a sequence modeling problem over item or content tokens (Hou et al., 31 Oct 2025, Shi et al., 8 Apr 2025, Wang et al., 2023).

2. Data Augmentation and Representation

Generative recommendation leverages data augmentation by synthesizing realistic training examples and unifying heterogeneous signals:

  • Knowledge-infused augmentation: LLMs generate enriched content (summaries, hierarchical attributes) that augment item and user features (Hou et al., 31 Oct 2025, Lee et al., 2 Jun 2025).
  • Sequential augmentation: Processes such as GenPAS model sequence sampling, target sampling, and input sampling to control the training distribution of input-target pairs. The (α, β, γ) parameterization allows precise, bias-controlled data construction to improve generalization and alignment with future user actions (Lee et al., 17 Sep 2025).
  • Multi-modal data unification: Attribute fusion (text, vision, graph) and virtual agents (behavioral simulation) populate training corpora with richly structured user-item interactions.

Table: GenPAS Augmentation Strategies

Strategy α β γ
Last-Target 0 −∞
Multi-Target 1 0 −∞
Slide-Window 2 1 0

This explicit control over augmentation allows generative recommenders to achieve high accuracy, data efficiency, and parameter efficiency, especially under sparse or biased data regimes (Lee et al., 17 Sep 2025).

3. Model Architectures and Tokenization

Generative recommendation systems typically integrate two key components:

  • Item tokenization: Items are mapped to discrete code sequences (“semantic IDs”) via hierarchical K-means, residual quantization (RQ-VAE), or product quantization (Liu et al., 29 Sep 2025, Shi et al., 8 Apr 2025, Xiao et al., 10 Feb 2025). Tokenizers may incorporate both semantic (content) and collaborative (behavioral) embeddings. Models such as PRORec employ cross-modality alignment and intra-modality distillation to avoid semantic domination and ensure robust representation fusion (Xiao et al., 10 Feb 2025).
  • Generative backbone: Sequence models (LLMs, transformers, diffusion architectures) autoregressively emit the next item code conditioned on user history. Advanced frameworks (BLOGER) employ bi-level optimization, meta-learning, and gradient surgery to jointly align tokenizer and generator for recommendation accuracy (Bai et al., 24 Oct 2025).

Recent work recognizes information bottlenecks in fixed discrete tokenization:

  • Scaling up SID-based generative recommenders quickly saturates performance, as larger encoders and codebooks cannot overcome the representational ceiling imposed by discrete codes (Liu et al., 29 Sep 2025).
  • End-to-end generation via large LLMs (“LLM-as-RS”) exhibits smooth scaling, with unsaturated gains in Recall@k and NDCG@k as the model size increases, challenging the belief that LLMs cannot capture collaborative filtering signals (Liu et al., 29 Sep 2025).

In multi-behavior contexts, tokenization incorporates chain-of-thought paths from product knowledge graphs, behavior tokens, and semantic codes, boosting interpretability and behavior alignment (Ma et al., 19 Jul 2025).

4. Training Objectives and Optimization

The dominant training objective is the autoregressive negative log-likelihood for sequence generation:

Lgen=t=1LlogP(ytx,y<t)\mathcal{L}_{\mathrm{gen}} = -\sum_{t=1}^{L}\log P\big(y_t \mid x, y_{<t}\big)

where yty_t denotes the token at position tt.

Advanced optimization techniques include:

  • Bi-Level Optimization: BLOGER trains the generator at the lower level and tunes the tokenizer at the upper level, balancing tokenization and recommendation losses via meta-gradients and gradient surgery for joint alignment (Bai et al., 24 Oct 2025).
  • Distribution Matching: DMRec bridges collaborative and language modeling spaces by matching the posteriors over latent representations, aligning generative capability and semantic capacity (Zhang et al., 10 Apr 2025).
  • GFlowNets Fine-Tuning: GFlowGR treats item generation as a trajectory in a Markov decision process, allocating sample mass to multi-modal high-reward paths and mitigating exposure bias inherent in classical SFT and DPO (Wang et al., 19 Jun 2025).
  • Sparse Attention and Reasoning: GRACE implements journey-aware sparse attention and chain-of-thought tokenization, dramatically reducing computational cost while improving accuracy and explicit reasoning (Ma et al., 19 Jul 2025).

5. Unified Foundations and Multi-Task Formulation

Generative recommendation naturally supports multi-task learning:

  • Unified generative frameworks such as GenSAR and SynerGen model both search (semantic matching of queries to items) and recommendation (user–item sequence prediction) using shared generative backbones, dual-purpose identifiers, and joint optimization over retrieval and ranking tasks (Shi et al., 8 Apr 2025, Gao et al., 26 Sep 2025).
  • Personalized content generation: GeneRec and related paradigms extend generative recommendation beyond selection to content creation via instruction-guided generators (AI creator/editor), enabling dynamic creation, repurposing, and trustworthy recommendation of new items (Wang et al., 2023).

6. Empirical Evaluation and Scaling Laws

Benchmark studies demonstrate consistent empirical gains:

  • SID-based models plateau at modest model sizes (10–20M parameters); LLM-as-RS and unified generative frameworks scale smoothly to billions, with up to 20% Recall@5 improvement and unsaturated scaling curves (Liu et al., 29 Sep 2025).
  • BLOGER brings statistically significant (~1–3% relative) improvements over prior state-of-the-art models in Recall@k and NDCG, with marginal computational overhead (Bai et al., 24 Oct 2025).
  • GRACE achieves up to +106.9% in HR@10 and +106.7% in NDCG@10 compared to previous baselines, while reducing attention computation by up to 48% (Ma et al., 19 Jul 2025).
  • GenPAS demonstrates augmentation strategies can yield large (up to 38%) relative improvements over standard pipelines (Lee et al., 17 Sep 2025).
  • GFlowGR addresses diversity and exposure bias, resulting in higher recall and NDCG, lower KL-divergence to ground-truth distributions, and richer recommendation sets (Wang et al., 19 Jun 2025).
  • Practitioners should select augmentation and codebook strategies by two-step distributional filtering and cross-modal balance to optimize generalization and efficiency (Lee et al., 17 Sep 2025, Xiao et al., 10 Feb 2025).

7. Challenges, Limitations, and Future Directions

Generative recommendation faces several open challenges:

  • Scaling bottlenecks: Discrete code-based models quickly hit representational ceilings, requiring self-supervised or end-to-end code learning to unlock further gains (Liu et al., 29 Sep 2025).
  • Bias and robustness: Popularity bias, fairness issues, prompt sensitivity, and adversarial vulnerabilities remain significant hurdles. Robustness to natural and synthetic noise is not yet resolved (Hou et al., 31 Oct 2025).
  • Benchmark and deployment: Static datasets lack interactivity; benchmarks need to capture multi-task, conversational, and reasoning capabilities. Inference efficiency (autoregressive beam search, context length) and cost-effective tuning (parameter-efficient fine-tuning) remain open problems at industrial scale (Hou et al., 31 Oct 2025, Gao et al., 26 Sep 2025).
  • Expressive content creation: Ensuring fidelity—fairness, safety, authenticity—of generated items is crucial for trustworthy recommendation, especially in domains such as news, video, and personalized product design (Wang et al., 2023, Gao et al., 6 Mar 2024).
  • Unified generative assistants: Future work aims for end-to-end assistants integrating dialog, retrieval, reasoning, ranking, explanation, and dynamic content generation under a single language-driven architecture (Hou et al., 31 Oct 2025).

References

Generative recommendation thus represents a convergence of sequence modeling, world knowledge synthesis, multi-modal augmentation, and powerful conditional generation techniques toward fully personalized, context-rich, and interpretable recommendation technologies.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Generative Recommendation.