OneSearch Framework: Unified E-Commerce Search
- OneSearch is a unified generative framework that consolidates recall, ranking, and personalization using transformer-based models.
- It leverages innovations like Keyword-Enhanced Hierarchical Quantization Encoding and multi-view user behavior injection to optimize query-item relevance and personalization.
- Empirical evaluations show significant gains in CTR, order volume, and cost efficiency compared to traditional multi-stage architectures.
OneSearch is a unified, end-to-end generative framework for e-commerce search that eschews traditional multi-stage cascaded architectures in favor of direct input-output mapping via transformer-based models. Originating as the first industrial-deployed generative search system of its kind, OneSearch consolidates recall, pre-ranking, and ranking into a single, optimized inference pathway. It introduces several architectural and algorithmic innovations for improving query-item relevance, modeling user preferences, and enhancing efficiency, demonstrated through both large-scale offline evaluations and rigorous online A/B testing in production environments (Chen et al., 3 Sep 2025).
1. Framework Structure and Motivation
Traditional e-commerce search systems employ Multi-stage Cascading Architectures (MCAs) consisting of sequential recall, pre-ranking, and final ranking stages. Each stage is optimized for different objectives and relies on different computation and storage resources, resulting in fragmented processing and potential collisions of optimization objectives. OneSearch abandons this separation by directly ingesting user queries, along with rich behavioral context, into a unified generative model that outputs ordered item identifiers (SIDs) in a single forward pass. This model-centric design resolves computation fragmentation, reduces latency, and addresses the limitations posed by candidate pipe shrinkage and incompatible optimization targets across MCA stages.
OneSearch's generative architecture leverages transformer-based models (e.g., encoder–decoder such as BART, or decoder-only models like Qwen3) to handle heterogeneous and context-enriched input structures, including user behavior, profile data, and query text.
2. Keyword-Enhanced Hierarchical Quantization Encoding (KHQE)
The KHQE module is designed to extract structured and discriminative representations from noisy and redundant item descriptions. It operates in two primary phases:
- Keyword Enhancement: Domain knowledge and Named Entity Recognition (NER) are used to extract “core keyword” attributes (e.g., brands, attributes) from item content. This ensures that distinctive attributes outweigh irrelevant or noisy information during semantic encoding.
- Hierarchical Quantization: KHQE employs Residual Quantization via K-means (RQ-Kmeans) to encode hierarchical, shared features for similar item clusters across multiple codebook levels, and Optimized Product Quantization (OPQ) to tokenize residual, unique item features. This dual quantization captures both generalizable and idiosyncratic item attributes.
The final query and item representations are formed as weighted averages between raw semantic embeddings and aggregated keyword embeddings:
Hierarchical multi-layer quantization configurations, such as "4096-1024-512" codebooks, achieve high codebook utilization and coding independence, with OPQ extensions providing additional lateral token IDs (e.g., "256-256").
3. Multi-View User Behavior Sequence Injection
OneSearch models user preferences through a multi-view behavioral injection strategy:
- Behavioral User ID Construction: The user ID is derived from weighted aggregation of SIDs associated with user click history, emphasizing recent activity:
where
- Short-Term Sequence Injection: Recent behavior sequences (queries/clicks) are explicitly inserted into the model prompt to focus attention on immediate user intent.
- Long-Term Preference Integration: RSU and purchase histories are implicitly encoded via RQ centroid mapping, aggregated across hierarchical layers. This produces a holistic, multi-scale user preference profile for ranking.
These input features are formatted and processed using structured separator tokens ([BOS], [EOS], [SEP]) and integrated into the transformer input:
4. Preference-Aware Reward System (PARS)
OneSearch incorporates a multi-stage Preference-Aware Reward System to balance relevance and fine-grained personalization in ranking:
- Multi-Stage Supervised Fine-Tuning (SFT): The generative model is trained for semantic alignment, query/item text-to-SID mapping, and simultaneous user personalization.
- Adaptive Reward Model: User feedback signals (purchase, click, exposure) are hierarchically weighted for reward computation:
- Hybrid Ranking Objective: The ranking loss incorporates both log-likelihood and reward-based guidance. A representative loss function:
Through multi-stage SFT and list-wise ranking, the model is supervised to optimize both semantic and personalized reward criteria.
5. Empirical Evaluation and Efficiency Gains
Extensive offline experiments on industry-scale datasets show consistent gains in recall and ranking metrics with all OneSearch innovations (KHQE, multi-view behavior, PARS). Key measurements include improvements in HitRate@350 and MRR@350 with the inclusion of hierarchical quantization and long-sequence behavioral information. RQ-OPQ tokenization strategies further boost Recall@10 and MRR@10 over RQ-VAE baselines.
In large-scale online A/B tests deployed at Kuaishou:
- CTR gain: +1.67% for items; +3.14% PV CTR.
- Buyer volume: +2.40%.
- Order volume: +3.22%.
- Model FLOPs Utilization (MFU): Improved from 3.26% (MCA baseline) to 27.32%.
- Operational Expenditure (OPEX): Reduced by 75.40%.
Testing a non-ranking MCA variant resulted in dramatic order volume loss (–39.14%), highlighting the effectiveness of unified ranking.
6. Production Deployment and Usage Context
OneSearch is fully deployed at Kuaishou, supporting all traffic for detail page search, 50% for mall search, and 20% for homepage search, serving millions of users and tens of millions of daily pageviews. The model's deployment strategy leverages scalable inference, reduced communication bottlenecks, and improved throughput due to collapsible computation.
The unified generative design facilitates rapid updating and fine-tuning cycles, enhanced model utilization, and streamlined operation, addressing typical industrial search challenges of scalability, data freshness, and personalized experience.
7. Framework Implications and Broader Significance
OneSearch demonstrates that tightly integrated, generative, and behavior-aware search architectures can overcome the inherent drawbacks of MCAs—namely fragmented computation and conflicting optimization objectives. By unifying recall, ranking, and personalization, the framework establishes a robust model pipeline for industrial-scale e-commerce retrieval.
The architectural blueprints, quantization, and reward formulation principles presented in this system are directly extensible to other unified search frameworks requiring efficient, personalized, multi-modal ranking solutions. This transformation suggests a new paradigm in commercial search system engineering, where generative modeling and behavior-driven personalization comprise the central design ethos.
Summary Table: OneSearch Key Innovations and Impact
Component/Innovation | Technical Description | Measured Impact |
---|---|---|
KHQE (Hierarchical Quantization) | NER keyword extraction + RQ/OPQ quantization | +HitRate@350, MRR@350, semantic relevance |
Multi-view Behavior Injection | Aggregate short/long user sequences | Enhanced personalization, user-centric ranking |
PARS (Reward System) | SFT + adaptive reward-weighted listwise loss | +CTR, buyer/order volume, MFU, -OPEX |
Production Deployment | Real-time, full and partial traffic handling | Millions of users, tens of millions PVs daily |
This summary reflects all core quantitative and technical claims as stated in (Chen et al., 3 Sep 2025).