RecGPT: Generative Recommendation Framework

Updated 1 September 2025

RecGPT is a recommendation framework that integrates transformer-based generative models with explicit user intent modeling to deliver personalized sequential recommendations.
It employs domain-invariant, text-centric item encoding using MPNet and finite scalar quantization, enabling zero-shot embedding and cross-domain generalization.
The framework leverages multi-stage training with generative prompt tuning and human-LLM cooperative evaluation, achieving significant performance gains in real-world deployments.

RecGPT encompasses a class of LLM-driven frameworks and models for recommendation that integrate generative modeling, intent-centric design, and domain-adaptive training methodologies. Developed across successive works from 2024–2025, RecGPT technologies span generative sequential recommendation, text-based recommender foundation models, and large-scale deployments in commercial environments. RecGPT systems are characterized by their shift from log-fitting and ID-based paradigms to explicit modeling of user intent, domain-invariant item representations, generative prompt paradigms, and scalable training and evaluation strategies.

1. Architectural Foundations

RecGPT architectures are founded on transformer-based generative models and related LLM designs. Early instantiations (Zhang et al., 6 Apr 2024) leverage Generative Pre-training Transformers (GPTs) for sequential recommendation, eschewing classical dual-tower approaches. The model inputs a sum of user embedding, item embedding (mapped from the behavioral sequence), and position embedding, initializing $h_u^0 = u W_u + s_u W_e + W_p$ . Through masked multi-head self-attention, each layer utilizes:

$A_{(l,h)} = \mathrm{softmax}\left(\frac{Q_{(l,h)}K_{(l,h)}^\top}{\sqrt{d}} + M\right) V_{(l,h)}$

where $M$ encodes attention masking.

Subsequent RecGPT models extend architectural scale (Ngo et al., 21 May 2024), introducing RecGPT-7B and RecGPT-7B-Instruct—fully-trained LLMs with 7 billion parameters, 32 attention heads, 32 layers, and context extrapolation via ALiBi. Items are processed as ordered text documents indexed through GPT-NeoX’s tokenizer.

The latest progression formalizes RecGPT as a foundation model for sequential recommendation (Jiang et al., 6 Jun 2025), abandoning ID-based representations for fully text-driven item embeddings. Unified item tokenization is performed by MPNet-based encoders followed by Finite Scalar Quantization (FSQ), mapping continuous embeddings into discrete standardized token sequences:

$FSQ(e^k_i) = R[(L-1) \cdot \sigma(\mathcal{T}_{in}(e^k_i))]$

Hybrid bidirectional-causal attention mechanisms simultaneously capture intra-item bidirectional coherence and inter-item causal dependency, enabling robust sequential modeling.

2. User Intent Modeling and Personalized Prompt Generation

Central to RecGPT is the generative mining of user intent. In “Generative Personalized Prompts for Sequential Recommendation via ChatGPT Training Paradigm” (Zhang et al., 6 Apr 2024), a user module incorporates user ID embeddings to condition the model on individual-specific traits. During training, two stages are employed:

Pre-training: Auto-regressive modeling of behavior sequences, optimizing a binary cross-entropy loss over next-item prediction.
Fine-tuning: Personalized prompt-tuning generates intermediate “conversational” tokens as prompts. These generated prompts, segmented via $W_s$ to distinguish origins, augment the behavioral sequence.

The multi-stage user profiling used in “RecGPT Technical Report” (Yi et al., 30 Jul 2025) decomposes ultra-long user sequences via reliable behavior extraction and hierarchical compression. A specialized LLM, denoted $\mathcal{L}LM_{(UI)}$ , generates an explicit interest set:

$I_u = \mathcal{L}LM_{(UI)}(A_u, B_u, I_m | P_{UI})$

where $A_u$ is user attributes, $B_u$ the compressed sequence, and $I_m$ a candidate pool. Item tag prediction is conducted via $\mathcal{L}LM_{(IT)}$ , yielding semantic tags for retrieval, while $\mathcal{L}LM_{(RE)}$ generates personalized recommendation explanations.

3. Item Representation and Domain-Invariant Tokenization

RecGPT foundation models achieve domain invariance by representing items exclusively via textual features rather than opaque IDs (Jiang et al., 6 Jun 2025). Item descriptions (title, category, attributes) are transformed through MPNet into continuous embeddings and subsequently quantized through FSQ into token sequences of controlled granularity.

This process ensures:

Zero-shot embedding: New items and domains are instantly embeddable.
Semantic continuity: Partitioning into K sub-vectors and sigmoid normalization preserves local semantic relationships.
Unified codebook construction: The quantization mechanism enforces a shared discrete space, eliminating domain barriers and enabling cross-domain generalization.

4. Training Paradigms and Human-LLM Synergy

RecGPT frameworks advance training beyond conventional supervised or log-fitting. In (Zhang et al., 6 Apr 2024), the two-stage ChatGPT-inspired sequence—auto-regression plus prompt-tuning—allows the model to adapt responsively to evolving user preferences.

Later systems (Yi et al., 30 Jul 2025) propose multi-stage training integrating:

Reasoning-enhanced pre-alignment: High-quality samples derived from a larger reasoning LLM, e.g., DeepSeek-R1, establish task-specific adaptation.
Self-training evolution: The LLM generates its own training samples, which are validated and refined, iteratively increasing robustness.
Human-LLM cooperative judge system: Human annotators rate a subset of samples; an LLM is fine-tuned on these judgments to scale multi-dimensional sample evaluation (e.g., willingness, relevance, safety). This system enables efficient, scalable quality control for training data.

RecGPT-7B-Instruct in (Ngo et al., 21 May 2024) is fine-tuned on 100K+ prompt-response pairs targeted at rating prediction and sequential recommendation with optimization via LION and mixed precision, highlighting attention to carbon footprint and training efficiency.

5. Inference and Recommendation Retrieval

RecGPT’s inference paradigms leverage both inner-product computation and autoregressive recall strategies (Zhang et al., 6 Apr 2024). The traditional score for item recommendation is computed as:

$P(i_{t+1} = i | i_{1:t}) \propto \exp(e_i^T \cdot F_t^L)$

The full RecGPT autoregressive mode generates multiple user interest vectors, each soliciting a top- $m$ item set. The process iterates by concatenating the top item back into the behavioral sequence and rerunning through the Transformer, recursively refining interest representations and recall.

More recent foundation models (Jiang et al., 6 Jun 2025) implement a catalog-aware beam search decoder constrained by Trie-based prefix matching, ensuring real-time mapping from token sequences to valid catalog items. Beam search facilitates simultaneous prediction of all $K$ item tokens, enhancing latency. Prefix matching restricts exploration to valid item representations.

6. Evaluation, Deployment, and Empirical Impact

RecGPT models demonstrate consistent performance advantages in both offline and industrial online settings. Experimental results in (Zhang et al., 6 Apr 2024) show RecGPT outperforming baselines such as SASRec and BERT4Rec on Amazon and Yelp datasets using Hit Ratio and NDCG metrics. Ablations attribute gains to prompt-tuning and autoregressive recall.

RecGPT-7B-Instruct (Ngo et al., 21 May 2024) sets new standards in rating prediction (RMSE=0.5316, MAE=0.2436 for selected datasets) and sequential recommendation, outperforming MF, MLP, P5, and ChatGPT (in few-shot mode) across benchmarks.

Real-world deployments (Yi et al., 30 Jul 2025) on Taobao’s homepage validate RecGPT’s impact:

Metric	Improvement	Description
Dwell Time (DT)	+4.82%	User engagement duration
Category Diversity	+6.96%	Diversity of clicked items
CTR	+6.33%	Click-through rate
IPV	+9.47%	Item page views
DCAU	+3.72%	Daily active click users

Additionally, RecGPT distributes exposure more evenly across popular and long-tail items, mitigating the “Matthew effect.”

7. Innovations and Field Implications

RecGPT introduces several key innovations:

Generative personalized prompt paradigm: Models generate in-sequence prompts capturing dynamic user interests.
Domain-invariant text-centric item encoding: Overcomes cold-start and cross-domain limitations by eschewing ID-based representations.
Hybrid tri-tower retrieval and semantic tag fusion (Editor’s term): Combines collaborative and semantic matching, enhancing retrieval fidelity.
Human-LLM cooperative judge control: Scales training quality assessment beyond manual annotation bottlenecks.
Sustainability and ecosystem health: Intent-centric modeling delivers performance gains while increasing diversity, fairness, and long-term ecosystem benefits.

A plausible implication is that RecGPT’s approaches—generative prompt mining, unified tokenization, and intent-centric frameworks—are likely to inform future research in both academic and industrial recommender systems by aligning recommendation quality with semantic understanding and counteracting systemic biases inherent in log-fitting paradigms.