Entity-Centric Query Refinement (2204.00743v2)

Published 2 Apr 2022 in cs.CL and cs.IR

Abstract: We introduce the task of entity-centric query refinement. Given an input query whose answer is a (potentially large) collection of entities, the task output is a small set of query refinements meant to assist the user in efficient domain exploration and entity discovery. We propose a method to create a training dataset for this task. For a given input query, we use an existing knowledge base taxonomy as a source of candidate query refinements, and choose a final set of refinements from among these candidates using a search procedure designed to partition the set of entities answering the input query. We demonstrate that our approach identifies refinement sets which human annotators judge to be interesting, comprehensive, and non-redundant. In addition, we find that a text generation model trained on our newly-constructed dataset is able to offer refinements for novel queries not covered by an existing taxonomy. Our code and data are available at https://github.com/google-research/language/tree/master/language/qresp.

Summary

The paper introduces QRESP, a structured method that partitions entity sets into comprehensive and interesting subsets based on user queries.
It leverages YAGO3’s taxonomy and trains a T5 model to generate refined query sets, improving exploration of ambiguous, under-specified searches.
Empirical evaluations show an 86% human preference over random selections, highlighting practical benefits and inspiring future work on domain adaptation.

This paper introduces the task of entity-centric query refinement, focusing on helping users efficiently explore and discover entities through refined queries. The need arises from user interactions involving under-specified, ambiguous, or open-ended list-intent queries, a substantial portion of web searches. The task aims to generate a small set of query refinements that assist users in domain exploration and entity discovery beyond directly listing all possible entities.

Methodology

The paper presents a structured approach to generate training data for this task by leveraging an existing knowledge base, YAGO3. By using YAGO3's taxonomy as a source for candidate query refinements, the authors propose a selection method called Query Refinement via Entity Space Partitioning (QRESP). This technique aims to partition the set of entities answering the input query into subsets that are comprehensive, interesting, and non-redundant. The primary contribution lies in selecting refinement sets that align with human judgments, measured by attributes such as comprehensiveness and interestingness.

Numerical Evaluations and Findings

The paper shows significant empirical results, indicating that refinement sets selected using the QRESP method are preferred 86% of the time over those randomly selected. Human annotators consistently rated these sets as providing a better overview and being more interesting compared to alternative selections. Moreover, the paper reports preference rates of 73% when compared with randomly chosen filtered subcategories, underscoring the effectiveness of the QRESP method.

Model Training and Evaluation

A T5 model, trained on the proposed dataset, demonstrates the capability to generate refinement sets for novel queries not covered by existing taxonomies. The paper conducts evaluations on held-out YAGO categories and additional external datasets—Natural Questions and TREC 2009 Million Query Track—illustrating the model's ability to adapt to queries outside the training domain.

Automated metrics suggest that the model trained on QRESP data consistently outperforms alternatives. However, challenges remain in domain adaptation as the model occasionally generates off-topic results when faced with unique or shifted domain inquiries.

Implications and Future Directions

This research has both practical and theoretical implications in query refinement strategies. Practically, the paper contributes to improving user search experiences in knowledge-intensive domains by enabling more focused entity discovery. Theoretically, it proposes an innovative measure of refinement set quality based on entity space partitioning, paving the way for enhanced refinement system design.

Future work involves addressing domain adaptation challenges and developing sophisticated automated evaluation metrics. The proposed QRESP-QA scoring could prove instrumental in refining these aspects by providing a flexible benchmark for automation and model testing.

Collectively, this paper marks a significant step forward in refining search experiences by tailoring refinements specific to user intents through structured entity space partitioning.

PDF Markdown

Related Papers

Self-Supervised Query Reformulation for Code Search (2023)
Feature-based reformulation of entities in triple pattern queries (2018)
Learning Joint Query Interpretation and Response Ranking (2012)
Query-time Entity Resolution (2011)
Query Expansion Based on Clustered Results (2011)