OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment

Published 26 Feb 2025 in cs.IR | (2502.18965v1)

Abstract: Recently, generative retrieval-based recommendation systems have emerged as a promising paradigm. However, most modern recommender systems adopt a retrieve-and-rank strategy, where the generative model functions only as a selector during the retrieval stage. In this paper, we propose OneRec, which replaces the cascaded learning framework with a unified generative model. To the best of our knowledge, this is the first end-to-end generative model that significantly surpasses current complex and well-designed recommender systems in real-world scenarios. Specifically, OneRec includes: 1) an encoder-decoder structure, which encodes the user's historical behavior sequences and gradually decodes the videos that the user may be interested in. We adopt sparse Mixture-of-Experts (MoE) to scale model capacity without proportionally increasing computational FLOPs. 2) a session-wise generation approach. In contrast to traditional next-item prediction, we propose a session-wise generation, which is more elegant and contextually coherent than point-by-point generation that relies on hand-crafted rules to properly combine the generated results. 3) an Iterative Preference Alignment module combined with Direct Preference Optimization (DPO) to enhance the quality of the generated results. Unlike DPO in NLP, a recommendation system typically has only one opportunity to display results for each user's browsing request, making it impossible to obtain positive and negative samples simultaneously. To address this limitation, We design a reward model to simulate user generation and customize the sampling strategy. Extensive experiments have demonstrated that a limited number of DPO samples can align user interest preferences and significantly improve the quality of generated results. We deployed OneRec in the main scene of Kuaishou, achieving a 1.6\% increase in watch-time, which is a substantial improvement.

Abstract PDF Upgrade to Chat

Authors (8)

Summary

The paper introduces OneRec, which unifies retrieval and ranking via an end-to-end encoder-decoder architecture to improve recommendation accuracy.
It employs a sparse Mixture-of-Experts design and iterative preference alignment to model user sessions effectively and boost engagement.
The system's deployment on Kuaishou increased watch-time by 1.6%, demonstrating its empirical advantages over traditional cascaded frameworks.

"OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment"

Introduction

The paper introduces a novel approach to video recommendation systems that leverages generative models to unify the retrieval and ranking process, traditionally handled by distinct systems in recommendation pipelines. The approach, titled OneRec, replaces cascade learning frameworks with a unified generative model designed to improve both system response time and sorting accuracy.

Unified Generative Architecture

OneRec employs a single-stage generative recommendation system, diverging from existing multi-stage cascade ranking systems that isolate retrieval and ranking functions. The architecture (Figure 1) integrates an encoder-decoder structure facilitating end-to-end candidate video generation:

Figure 1: (a) Our proposed unified architecture for end-to-end generation. (b) A typical cascade ranking system, which includes three stages from the bottom to the top: Retrieval, Pre-ranking, and Ranking.

Encoder-Decoder Structure: This architecture encodes a user's historical behavior sequences, then decodes potential video interests using an autoregressive sequence. The model uses a sparse Mixture-of-Experts (MoE) approach to scale model capacity efficiently without excessive computational overhead.
Session-Wise Generation: Unlike point-by-point generation, OneRec generates sessions contextually, respecting item relationships—a methodology that enhances coherence and diversity in recommendations.
Iterative Preference Alignment: A Direct Preference Optimization (DPO) mechanism is employed for generating item suggestions aligned with direct user preferences. A reward model, trained to predict user engagement, guides preference optimization.

Comprehensive Framework and Online Deployment

The system has been implemented on a large-scale short video recommendation platform, Kuaishou, with a notable increase in user engagement metrics.

Figure 2: The overall framework of OneRec, consists of two stages: (i) the session training stage which train OneRec with session-wise data; (ii) the IPA stage which utilizes iterative direct preference optimization with self-hard negatives.

Experimental Evaluation

OneRec demonstrates empirical superiority over existing generative and non-generative recommendation frameworks through both offline evaluations and online A/B testing.

Performance Metrics: The system showed a 1.6% increase in watch-time on the Kuaishou platform. Offline experiments demonstrated the system's effectiveness in aligning recommendations with user preferences, reducing redundancies inherent in traditional recommendation stages.
Ablation Studies: These studies (Figure 3) affirmed the efficacy of the session-wise training approach and MoE scalability in optimizing model parameters.
Figure 3: The ablation study on DPO sample ratio $r_{\rm DPO$. The results indicate that a 1% ratio of DPO training leads to significant gains but further increase the sample ratio results in limited improvements.

Implementation Considerations

Several important implementation details include leveraging CUDA acceleration for real-time deployments and adopting model pruning techniques to maintain efficiency in massive production environments. The architecture's scalability is enhanced by using sparse model structures, which facilitate inference efficiency, even at industrial-scale deployments.

Conclusion

OneRec represents a significant methodological advancement in the domain of recommender systems by unifying retrieval and ranking processes through an end-to-end generative framework. The integration of MoE scaling, session-wise context modeling, and IPA constitute core innovations that together yield a more robust system for productive recommendation generation. Future directions involve enhancing multi-target modeling to benefit user experience beyond watch-time and interaction metrics.

Markdown Report Issue