OneRec Foundation: Unified Recommender Models

Updated 2 January 2026

OneRec Foundation is a unified family of transformer-based, instruction-following recommender models that employ itemic token representations integrated with LLM pretraining.
It leverages a multi-stage pretraining and co-training strategy using domain-specific corpora and reinforcement learning to jointly learn retrieval, ranking, and reasoning.
Benchmark results on RecIF-Bench and Amazon datasets demonstrate significant gains over cascaded systems, underlining the benefits of aggressive data scaling and unified architecture.

OneRec Foundation is a family of large-scale, instruction-following, end-to-end generative recommender models that unify LLM pretraining, domain-specific recommendation corpora, reinforcement learning, and rigorous scaling laws. Its design and deployment respond to the longstanding need to move beyond fragmented retrieve–rank recommendation cascades, toward architectures in which retrieval, ranking, and reasoning are learned in a single transformer-based framework. OneRec Foundation models set state-of-the-art (SOTA) performance across a comprehensive recommendation benchmark (RecIF-Bench) and exhibit strong transfer to diverse industrial and academic datasets, most notably the Amazon review corpus (Zhou et al., 31 Dec 2025).

1. Architectural Foundations

OneRec Foundation is built upon the Qwen3 transformer architecture, adopting a pre-layernorm backbone with minor modifications to accommodate recommendation-specific “itemic” tokens. Two primary model sizes are introduced:

1.7B parameters (Qwen3-1.7B): 24 transformer blocks, $d_{\mathrm{model}}=2{,}048$ , $d_{\mathrm{ff}}=8{,}192$ , 32 heads ( $d_k = 64$ ), 32k context tokens.
8B parameters (Qwen3-8B): 32 transformer blocks, $d_{\mathrm{model}}=4{,}096$ , $d_{\mathrm{ff}}=16{,}384$ , 32 heads ( $d_k = 128$ ), 32k context tokens.

All base Qwen3 parameters are retained, supplementing with new itemic token embeddings. The output projection is untied in the 8B variant but tied in 1.7B, consistent with Qwen3 protocol.

The input sequence structure is based on “itemic tokens” generated by residual quantization (RQ-Kmeans): discrete item IDs (from, e.g., a short-video catalog or Amazon products) are expressed as a sequence of three codebook indices. This dense, hierarchical item representation bridges the gap between pure text pretraining and high-cardinality categorical data, a foundational challenge in recommendation.

2. Data Pipeline, Pretraining, and Co-training

The OneRec Foundation training pipeline leverages an open corpus of 119 million interactions across 202,000 users, preprocessed into itemic tokens via a three-layer RQ-Kmeans quantizer (codebook size 8,192 per layer).

Three domain-specific corpora form the backbone of domain adaptation:

Itemic Dense Caption Data: Itemic tokens mapped to natural-language captions.
Sequential User Behavior Data: Long item-token histories predicting next-item tokens.
Interleaved Persona Grounding Data: User portraits mixing text and itemic tokens.

To retain general reasoning capabilities, the model is co-trained with ≈29 billion tokens from math, code, and reasoning datasets (e.g., Nemotron-CC-Math, OpenMathReasoning), deduplicated to prevent label leakage in evaluation.

The pretraining follows two stages:

Itemic-Text Alignment: With all Qwen3 parameters frozen except for itemic embeddings (and 8B output-projection rows), 16B tokens are used to align item tokens and natural language.
Full-Parameter Co-Pretraining: All parameters unfrozen. Combined text and all three recommendation corpora, up to 33B tokens for OneRec-Foundation and 130B for Pro variant.

Optimization uses AdamW ( $\beta_1=0.9, \beta_2=0.95$ , weight decay 0.1), cosine LR decay (peak: $1\times10^{-3}$ for stage 1, $1\times10^{-4}$ for stage 2) with 10% warmup.

Post-training, three steps restore and sharpen general LLM and reasoning skills: (a) multi-task supervised fine-tuning (SFT) on instruction and all RecIF tasks, (b) on-policy distillation from Qwen3 using per-token reverse KL, and (c) RL-based recommendation fine-tuning (Rec-RL) using Group Relative Policy Optimization (GRPO).

3. Recommendation Scaling Laws and Resource Optimization

Recommendation scaling is empirically characterized by a loss law analogous to those studied for LLMs. For model size $N$ and dataset size $D$ :

$L(N,D)=E+\frac{A}{N^\alpha + B D^{-\beta}}$

with best-fit parameters $E=0.4232$ , $A=502.32$ , $B=7.02$ , $\alpha=0.3325$ , $\beta=0.1865$ in the reported regime.

The optimal allocation for a fixed compute budget $C \approx 6ND$ is: $N_{\mathrm{opt}} \propto C^{0.44}, \quad D_{\mathrm{opt}} \propto C^{0.56}$ Since $\beta < \alpha$ , recommendation is more sensitive to data scaling than model scaling, in contrast to generative textual LLMs—recommendation domains are data-hungry.

Model FLOPs Utilization (MFU), defined as the ratio of measured effective FLOPs to the hardware’s peak, achieves $23.7\%$ (training) and $28.8\%$ (inference), exceeding typical values in legacy recommenders and approaching top LLM pipeline efficiencies (Zhou et al., 16 Jun 2025).

4. Benchmarking: RecIF-Bench and Amazon Transfer

RecIF-Bench is introduced as a holistic suite for instruction-following recommender evaluation, spanning 8 tasks:

Layer 0: Semantic Alignment (Item Understanding – LLM-F1)
Layer 1: Recommendation – Short Video Rec, Ad Rec, Product Rec (Recall@32), and Label Prediction (AUC)
Layer 2: Interactive Rec, Label-Conditional Rec (Recall@32)
Layer 3: Recommendation Explanation (LLM-Judge), combining reasoning and natural language.

OneRec-Foundation 8B-Pro outperforms all prior baselines on every task. For example, in Short Video Rec (Recall@32), OneRec-8B-Pro achieves $0.0369$ (baseline: $0.0180$); in Ad Rec, $0.0964$ (baseline: $0.0581$). In reasoning-oriented tasks, LLM-Judge rating for explanations reaches $4.0381$.

On the Amazon sequential recommendation benchmark (10 categories), OneRec-8B-Pro yields Recall@10 $0.0777$ (baseline: $0.0612$), a $+26.8\%$ improvement. With only 10% of the training data (“few-shot”), OneRec preserves $45.2\%$ of full-data Recall@10, compared to $23.0\%$ for TIGER.

Adaptation strategies were compared for Amazon: text-only (item keywords), extended RQ (added quantization depth), and text-augmented itemic (concatenate pre-trained itemic codes and keywords). The combined text-augmented itemic approach provided best transfer.

5. Unified Framework: Integration with LLM Capabilities

OneRec-Foundation is explicitly designed not just as a domain recommender, but as a model retaining broad LLM capabilities, including instruction following, mathematical reasoning, and code understanding. After co-pretraining and RL, performance on representative LLM benchmarks (MATH-500, GSM8K, AIME’24, MMLU-Pro, GPQA-Diamond, IFEVAL, LiveCodeBench) is retained with minor degradation, indicating effective mitigation of catastrophic forgetting.

General RL is formulated via GRPO: $\mathcal{L}_{\mathrm{GRPO}} = \frac{1}{G}\sum_{i=1}^G\bigl(\mathrm{Adv}_i\cdot\log\pi_\theta(R_i\mid q)\bigr) -\beta\,KL(\pi_\theta\parallel\pi_{\mathrm{ref}})$ with the reward $r(R_i)$ assigned to 1 if the correct itemic token appears in the sequence.

This framework enables multitask fine-tuning, on-policy distillation, and multi-corpus recommendation RL within a single architecture.

6. Open Source Contributions, Data, and Benchmarking Ecosystem

OneRec-Foundation is released in both 1.7B and 8B variants, supported by open-source training code and a reproducible pipeline. The RecIF-Bench benchmark and a 96M interaction dataset are provided for community use. Released codebases cover data processing (itemic quantization), co-pretraining with LLM corpora, multi-task SFT, on-policy distillation, and RL fine-tuning. This infrastructure is intended to facilitate rigorous, reproducible progress across both recommender system and LLM research communities (Zhou et al., 31 Dec 2025).

7. Significance and Limitations

OneRec-Foundation demonstrates that large transformer models with unified itemic–text representation, multi-domain pretraining, and post-training can outperform traditional cascaded recommenders in both specialized and transfer settings. The RecIF-Bench results show SOTA in instruction-following recommendation. Relative improvements in Recall@10 on Amazon span all ten domains (+26.8% vs. best prior). The scaling law analysis suggests that recommendation models benefit more from aggressive data scaling than parameter expansion.

Despite the advances, fundamental limitations remain: OneRec-Foundation models—while exhibiting improved instruction-following and reasoning—are still constrained by domain data and lack comprehensive world knowledge. Building truly generalist, human-level recommender systems remains an open challenge, and further integration between LLM and recommender architectures is suggested as a promising research direction (Zhou et al., 31 Dec 2025).

PDF Markdown Chat (Pro)

References (2)

OpenOneRec Technical Report (2025)

OneRec Technical Report (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to OneRec Foundation.