Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Shareable Representations for Search Query Understanding (2001.04345v1)

Published 20 Dec 2019 in cs.IR, cs.CL, cs.LG, and stat.ML

Abstract: Understanding search queries is critical for shopping search engines to deliver a satisfying customer experience. Popular shopping search engines receive billions of unique queries yearly, each of which can depict any of hundreds of user preferences or intents. In order to get the right results to customers it must be known queries like "inexpensive prom dresses" are intended to not only surface results of a certain product type but also products with a low price. Referred to as query intents, examples also include preferences for author, brand, age group, or simply a need for customer service. Recent works such as BERT have demonstrated the success of a large transformer encoder architecture with LLM pre-training on a variety of NLP tasks. We adapt such an architecture to learn intents for search queries and describe methods to account for the noisiness and sparseness of search query data. We also describe cost effective ways of hosting transformer encoder models in context with low latency requirements. With the right domain-specific training we can build a shareable deep learning model whose internal representation can be reused for a variety of query understanding tasks including query intent identification. Model sharing allows for fewer large models needed to be served at inference time and provides a platform to quickly build and roll out new search query classifiers.

The paper "Shareable Representations for Search Query Understanding" addresses an essential challenge in the domain of search engines, especially pertinent to shopping platforms. The core issue tackled by the authors is the effective interpretation of diverse and numerous search queries. These queries, often exceeding billions annually, encapsulate a wide range of user preferences and intents—such as searching for a specific price range, brand, or product category.

To solve this complex problem, the authors leverage advancements in NLP, specifically utilizing large transformer encoder architectures like BERT. Such models have shown exceptional performance across various NLP tasks by learning context-based representations of words and phrases through extensive pre-training on large text corpora.

The key contributions of the paper can be summarized as follows:

  1. Adaptation of BERT for Search Query Understanding: The authors describe how they adapt the BERT architecture to understand and classify the intents behind search queries. The use of transformer models enables the system to interpret the contextual meaning of queries effectively.
  2. Handling Noisy and Sparse Data: The paper explores methods to mitigate the challenges posed by noisy and sparse search query data. This is a common issue in search engines where the vast majority of queries are unique and may not have sufficient historical data to train traditional machine learning models effectively.
  3. Cost-effective Hosting and Low Latency Inference: An essential aspect of deploying deep learning models in production is ensuring they can operate within acceptable latency limits. The authors propose cost-effective methods for hosting these transformer models, ensuring that they maintain the low latency requirements necessary for real-time search applications.
  4. Shareable Deep Learning Models: A significant innovation presented in the paper is the concept of shareable model representations. By developing a deep learning model with representations that can be reused, the system reduces the need to serve multiple large models at inference time. This not only conserves computational resources but also accelerates the deployment of new query classifiers.

The practical implications of these contributions mean that shopping search engines can deliver more accurate and satisfying customer experiences by correctly interpreting the nuanced intents behind user queries. The reusable and shareable nature of the trained models enhances scalability and efficiency, making it easier to update and expand query understanding capabilities.

In summary, this paper provides a compelling approach to enhancing search query understanding using state-of-the-art NLP techniques, addressing both technical and operational challenges associated with deploying such models in a real-world shopping search engine context.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Mukul Kumar (11 papers)
  2. Youna Hu (2 papers)
  3. Will Headden (1 paper)
  4. Rahul Goutam (6 papers)
  5. Heran Lin (1 paper)
  6. Bing Yin (56 papers)
Citations (4)