Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 91 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 29 tok/s
GPT-5 High 26 tok/s Pro
GPT-4o 98 tok/s
GPT OSS 120B 470 tok/s Pro
Kimi K2 216 tok/s Pro
2000 character limit reached

OneSearch: A Preliminary Exploration of the Unified End-to-End Generative Framework for E-commerce Search (2509.03236v1)

Published 3 Sep 2025 in cs.IR

Abstract: Traditional e-commerce search systems employ multi-stage cascading architectures (MCA) that progressively filter items through recall, pre-ranking, and ranking stages. While effective at balancing computational efficiency with business conversion, these systems suffer from fragmented computation and optimization objective collisions across stages, which ultimately limit their performance ceiling. To address these, we propose \textbf{OneSearch}, the first industrial-deployed end-to-end generative framework for e-commerce search. This framework introduces three key innovations: (1) a Keyword-enhanced Hierarchical Quantization Encoding (KHQE) module, to preserve both hierarchical semantics and distinctive item attributes while maintaining strong query-item relevance constraints; (2) a multi-view user behavior sequence injection strategy that constructs behavior-driven user IDs and incorporates both explicit short-term and implicit long-term sequences to model user preferences comprehensively; and (3) a Preference-Aware Reward System (PARS) featuring multi-stage supervised fine-tuning and adaptive reward-weighted ranking to capture fine-grained user preferences. Extensive offline evaluations on large-scale industry datasets demonstrate OneSearch's superior performance for high-quality recall and ranking. The rigorous online A/B tests confirm its ability to enhance relevance in the same exposure position, achieving statistically significant improvements: +1.67\% item CTR, +2.40\% buyer, and +3.22\% order volume. Furthermore, OneSearch reduces operational expenditure by 75.40\% and improves Model FLOPs Utilization from 3.26\% to 27.32\%. The system has been successfully deployed across multiple search scenarios in Kuaishou, serving millions of users, generating tens of millions of PVs daily.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper presents an innovative unified generative framework that replaces multi-stage architectures with a single end-to-end model for e-commerce search.
  • It details a novel methodology combining keyword-enhanced hierarchical quantization encoding and multi-view behavior sequence injection to effectively capture user intent.
  • Empirical results demonstrate significant improvements in recall, CTR, and resource efficiency, underscoring the framework's industrial viability and potential to streamline search systems.

Introduction and Motivation

The paper introduces OneSearch, an industrial-scale, end-to-end generative retrieval framework for e-commerce search, designed to address the inherent limitations of traditional multi-stage cascading architectures (MCA). In conventional e-commerce search systems, the MCA paradigm segments retrieval into recall, pre-ranking, and ranking stages, each optimized for different objectives and computational constraints. This fragmentation leads to suboptimal global performance due to objective collisions, inefficient resource utilization, and limited ability to model user intent holistically.

OneSearch proposes a unified generative approach that directly maps user queries and behavioral context to item candidates, eliminating the need for multi-stage filtering and enabling joint optimization of relevance and personalization. The framework is deployed at scale on the Kuaishou platform, serving millions of users and demonstrating significant improvements in both offline and online metrics. Figure 1

Figure 1: (a) The proposed End-to-End generative retrieval framework (OneSearch), (b) the traditional multi-stage cascading architecture in E-commerce search.

System Architecture and Key Innovations

OneSearch is architected around four principal components:

  1. Keyword-Enhanced Hierarchical Quantization Encoding (KHQE): This module encodes items and queries into semantic IDs (SIDs) using a hierarchical quantization schema, augmented with core keyword extraction to preserve essential attributes and suppress irrelevant noise. The encoding pipeline combines RQ-Kmeans for hierarchical clustering and OPQ for fine-grained residual quantization, ensuring high codebook utilization and independent coding rates.
  2. Multi-view Behavior Sequence Injection: User modeling is achieved by integrating explicit short-term and implicit long-term behavioral sequences. User IDs are constructed from weighted aggregations of recent and historical clicked items, and both short and long behavior sequences are injected into the model via prompt engineering and embedding aggregation, respectively. This multi-view approach enables comprehensive personalization.
  3. Unified Encoder-Decoder Generative Model: The system employs a transformer-based encoder-decoder architecture (e.g., BART, mT5, or Qwen3) to jointly model user, query, and behavioral context, generating item SIDs as output. The model is trained with a combination of supervised fine-tuning and preference-aware reinforcement learning.
  4. Preference-Aware Reward System (PARS):

A multi-stage supervised fine-tuning process aligns semantic and collaborative representations, followed by an adaptive reward system that leverages hierarchical user behavior signals and list-wise preference optimization. The reward model is trained on real user interactions, incorporating CTR, CVR, and relevance signals. Figure 2

Figure 2: The OneSearch framework: (1) KHQE for semantic encoding, (2) multi-view behavior sequence injection, (3) unified encoder-decoder generative retrieval, (4) preference-aware reward system.

Hierarchical Quantization Encoding and Tokenization

The KHQE module addresses the challenge of representing items with long, noisy, and weakly ordered textual descriptions. By extracting core keywords using NER and domain-specific heuristics, the encoding process emphasizes essential attributes (e.g., brand, category) and suppresses irrelevant tokens. The hierarchical quantization pipeline operates as follows:

  • RQ-Kmeans: Hierarchically clusters item embeddings, maximizing codebook utilization and independent coding rates.
  • OPQ: Quantizes residual embeddings to capture fine-grained, item-specific features.
  • Core Keyword Enhancement: Core keywords are embedded and averaged with item representations, further improving the discriminative power of SIDs.

Empirical results demonstrate that this approach yields higher recall and ranking performance compared to standard RQ-VAE or balanced k-means tokenization, with significant improvements in codebook utilization and independent coding rates. Figure 3

Figure 3: Different hierarchical quantization encodings of items, illustrating the impact of KHQE and OPQ on SID assignment.

Multi-view User Behavior Modeling

OneSearch's user modeling strategy integrates three perspectives:

  • Behavior Sequence-Constructed User IDs:

User IDs are computed as weighted sums of SIDs from recent and long-term clicked items, providing a semantically meaningful and behaviorally grounded identifier.

  • Explicit Short Behavior Sequences:

Recent queries and clicked items are explicitly included in the model prompt, enabling the model to capture short-term intent shifts.

  • Implicit Long Behavior Sequences:

Long-term behavioral patterns are aggregated via centroid embeddings at multiple quantization levels, efficiently encoding user profiles without excessive prompt length.

Ablation studies confirm that sequence-constructed user IDs and explicit/implicit behavior sequence injection yield substantial gains in both recall and ranking metrics, outperforming random or hashed user ID baselines.

Unified Generative Retrieval and Training Paradigm

The encoder-decoder model ingests the full user context and outputs item SIDs via constrained or unconstrained beam search. Training proceeds in three supervised fine-tuning stages:

  1. Semantic Content Alignment: Aligns SIDs with textual descriptions and category information.
  2. Co-occurrence Synchronization: Models collaborative relationships between queries and items.
  3. User Personalization Modeling: Incorporates user IDs and behavior sequences for personalized generation.

A sliding window data augmentation strategy is applied to short behavior sequences, enhancing the model's ability to generalize to users with limited history.

Preference-Aware Reward System and Hybrid Ranking

The reward system is designed to optimize both relevance and conversion objectives:

  • Adaptive Reward Signal:

User interactions are categorized into six levels, with adaptive weights derived from calibrated CTR and CVR metrics. The reward model is a three-tower architecture predicting CTR, CVR, and CTCVR, with an additional relevance score.

  • Hybrid Ranking Framework:

List-wise DPO training is used to align the generative model's output with the reward model's ranking, followed by further fine-tuning on pure user interaction data to overcome the limitations of reward model distillation.

This hybrid approach enables OneSearch to achieve a Pareto-optimal balance between relevance and personalization, surpassing the performance ceiling of traditional MCA systems.

Experimental Results

Offline Evaluation

OneSearch is evaluated on a large-scale industry dataset from Kuaishou's mall search platform. Key findings include:

  • Recall and Ranking:

OneSearch achieves higher recall (HR@350) and comparable or superior ranking (MRR@350) compared to the online MCA baseline.

  • Ablation Studies:

KHQE, OPQ, and multi-view behavior sequence injection each contribute significant performance gains. The system is robust to item pool changes, maintaining high codebook utilization and independent coding rates over time. Figure 4

Figure 4

Figure 4: ICR and SID ratio indicators of RQ-Kmeans over time, demonstrating stability under dynamic item pool conditions.

Online A/B Testing

Deployed on the Kuaishou platform, OneSearch demonstrates:

  • CTR and Conversion Gains:

Statistically significant improvements: +1.67% item CTR, +2.40% buyers, +3.22% order volume.

  • Resource Efficiency:

Model FLOPs Utilization increases from 3.26% (MCA) to 27.32% (OneSearch), and operational expenditure is reduced by 75.40%. Figure 5

Figure 5: Comparisons of MFU and OPEX for onlineMCA and OneSearch, highlighting substantial resource efficiency improvements.

  • Industry and Query Coverage:

Gains are observed across 28 of the top 30 industries and for queries of all popularity levels, including long-tail queries. Figure 6

Figure 6: Online CTR relative gains for the top 30 industries, showing broad applicability of OneSearch.

  • Manual Evaluation:

Increases in page good rate, item quality, and query-item relevance, confirming improvements in user experience.

Implications and Future Directions

OneSearch demonstrates that unified, end-to-end generative retrieval can replace complex, fragmented MCA pipelines in industrial e-commerce search, yielding improvements in both user engagement and system efficiency. The framework's modular design—combining advanced quantization, multi-view user modeling, and preference-aware optimization—enables robust adaptation to dynamic item pools and evolving user behavior.

Key implications:

  • Practical:

OneSearch reduces operational complexity, improves hardware utilization, and enhances user experience at scale.

  • Theoretical:

The results challenge the necessity of multi-stage architectures for large-scale retrieval, suggesting that joint modeling of relevance and personalization is feasible and beneficial.

Future work should focus on real-time tokenization for streaming data, further reinforcement learning for preference alignment, and integration of multi-modal item features (e.g., images, video) to enhance semantic understanding and reasoning.

Conclusion

OneSearch establishes a new paradigm for e-commerce search by unifying retrieval and ranking in a single generative model, leveraging hierarchical quantization, multi-view user modeling, and preference-aware optimization. Extensive offline and online evaluations confirm its superiority over traditional MCA systems in both effectiveness and efficiency. The deployment at scale on Kuaishou demonstrates its industrial viability and sets a benchmark for future research in generative retrieval for search and recommendation.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

X Twitter Logo Streamline Icon: https://streamlinehq.com