Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Hierarchical Recurrent Encoder-Decoder For Generative Context-Aware Query Suggestion (1507.02221v1)

Published 8 Jul 2015 in cs.NE and cs.IR

Abstract: Users may strive to formulate an adequate textual query for their information need. Search engines assist the users by presenting query suggestions. To preserve the original search intent, suggestions should be context-aware and account for the previous queries issued by the user. Achieving context awareness is challenging due to data sparsity. We present a probabilistic suggestion model that is able to account for sequences of previous queries of arbitrary lengths. Our novel hierarchical recurrent encoder-decoder architecture allows the model to be sensitive to the order of queries in the context while avoiding data sparsity. Additionally, our model can suggest for rare, or long-tail, queries. The produced suggestions are synthetic and are sampled one word at a time, using computationally cheap decoding techniques. This is in contrast to current synthetic suggestion models relying upon machine learning pipelines and hand-engineered feature sets. Results show that it outperforms existing context-aware approaches in a next query prediction setting. In addition to query suggestion, our model is general enough to be used in a variety of other applications.

Hierarchical Recurrent Encoder-Decoder for Context-Aware Query Suggestion

The paper presents a probabilistic model employing a hierarchical recurrent encoder-decoder (HRED) approach for generative, context-aware query suggestion. This advanced methodology is designed to address challenges in automated query suggestions faced by search engines, emphasizing the preservation of user intent through context-awareness.

Model Architecture

The core innovation lies in using a hierarchical structure of recurrent neural networks (RNNs), consisting of a query-level RNN encoder and a session-level RNN. Each query is encoded into a fixed-length vector which is subsequently used by the session-level RNN to encode sequences of queries into a single state. This hierarchical system is adept at capturing the order-sensitive nature of queries while avoiding data sparsity issues commonly encountered in traditional models. Moreover, it supports the generation of synthetic suggestions, even for rare or unseen queries, by probabilistically predicting query continuations word-by-word.

Empirical Results

Empirical evaluations demonstrate the model’s superiority over existing context-aware methods, notably in a next query prediction scenario. It harnesses the ability to generate queries and uses likelihood probabilities as features within a learning-to-rank framework, achieving significant performance improvements. The model’s success is particularly pronounced in long-session contexts and scenarios characterized by rare or unseen anchor queries.

Implications and Future Directions

The implications of this research extend to practical and theoretical realms. Practically, the model’s compactness and effectiveness in context-aware query prediction present tangible benefits for enhancing user search experience. Theoretically, it offers insights into leveraging hierarchical RNNs for contextually rich language applications. Future directions could explore integrating user click data for improved suggestion relevance, diversifying synthetic generation to enhance query reformulation, and adapting the framework to related tasks such as query auto-completion or next-word prediction.

This research strengthens the utility of hierarchical recurrent architectures in processing and predicting complex sequences, setting a precedent for future innovations in AI-driven language tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Alessandro Sordoni (53 papers)
  2. Yoshua Bengio (601 papers)
  3. Hossein Vahabi (13 papers)
  4. Christina Lioma (66 papers)
  5. Jakob G. Simonsen (1 paper)
  6. Jian-Yun Nie (70 papers)
Citations (537)