Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval (2410.13765v2)

Published 17 Oct 2024 in cs.CL and cs.IR

Abstract: LLMs have been used to generate query expansions augmenting original queries for improving information search. Recent studies also explore providing LLMs with initial retrieval results to generate query expansions more grounded to document corpus. However, these methods mostly focus on enhancing textual similarities between search queries and target documents, overlooking document relations. For queries like "Find me a highly rated camera for wildlife photography compatible with my Nikon F-Mount lenses", existing methods may generate expansions that are semantically similar but structurally unrelated to user intents. To handle such semi-structured queries with both textual and relational requirements, in this paper we propose a knowledge-aware query expansion framework, augmenting LLMs with structured document relations from knowledge graph (KG). To further address the limitation of entity-based scoring in existing KG-based methods, we leverage document texts as rich KG node representations and use document-based relation filtering for our Knowledge-Aware Retrieval (KAR). Extensive experiments on three datasets of diverse domains show the advantages of our method compared against state-of-the-art baselines on textual and relational semi-structured retrieval.

Summary

The paper introduces a novel Knowledge-Aware Retrieval (KAR) framework that leverages KG relations to enhance query expansion.
It employs document-based relation filtering to align LLM-generated query expansions with both textual content and relational structure.
Experimental results on AMAZON, MAG, and PRIME datasets demonstrate significant improvements in hit rates and retrieval accuracy.

Knowledge-Aware Query Expansion with LLMs for Textual and Relational Retrieval

The paper addresses the limitations of current query expansion techniques in information retrieval, especially their focus on enhancing textual similarity while neglecting document relational structures. The authors propose a novel framework named Knowledge-Aware Retrieval (KAR), which utilizes LLMs augmented with knowledge graph (KG) relations to improve performance for queries with both textual and relational elements.

Main Contributions

Knowledge-Aware Query Expansion Framework: The framework incorporates structured document relationships derived from KGs, addressing the gap in handling semi-structured queries with both textual and relational aspects.
Document-Based Relation Filtering: By using document texts as KG node representations, the method filters relations based on textual similarity, allowing for more targeted query expansion and retrieval.
Extensive Evaluation: Experiments on three diverse datasets (AMAZON, MAG, and PRIME) showcase the superiority of KAR over state-of-the-art baselines in terms of textual and relational retrieval.

Methodology

The paper introduces an innovative approach to query expansion that begins with parsing entities from the query using an LLM and retrieving related document texts and KG nodes. It employs document-based relation filtering to refine KG relations relevant to the query, enhancing the precision of the retrieval process. By leveraging these filtered document triples as inputs, LLM-generated query expansions are more effectively aligned with user intent and document corpus structure.

Results and Implications

The experimental results demonstrate that KAR shows significant improvements in hit rates and mean reciprocal rank across all tested datasets, with particularly strong performance in domains with denser relational structures such as MAG and PRIME.

AMAZON: Rich in textual information, leading to competitive performance among methods, yet KAR still achieves notable results.
MAG and PRIME: KAR utilizes the dense relational data effectively, outperforming others by maintaining structural accuracy in query expansion.

The paper highlights the potential of KG-augmented query expansion frameworks in improving retrieval systems, particularly in scenarios demanding nuanced understanding of both textual content and relational context.

Future Directions

The authors suggest further exploration into optimizing KG-enhanced LLMs for different query types and corpora, aiming for broader applicability across diverse domains. Additionally, investigating efficient integration methods to reduce retrieval latency remains an area for potential improvement.

Conclusion

This paper provides a compelling approach to overcoming existing challenges in retrieval systems by integrating relational structures into LLM-based query expansions. The KAR framework not only demonstrates its efficacy in controlled settings but also suggests broader applications for multi-faceted search tasks, making a noteworthy contribution to the field of information retrieval.

Related Papers

Tweets

https://twitter.com/_reachsumit/status/1847157330645041191