CHIQ: Contextual History Enhancement for Improving Query Rewriting in Conversational Search (2406.05013v2)

Published 7 Jun 2024 in cs.IR and cs.CL

Abstract: In this paper, we study how open-source LLMs can be effectively deployed for improving query rewriting in conversational search, especially for ambiguous queries. We introduce CHIQ, a two-step method that leverages the capabilities of LLMs to resolve ambiguities in the conversation history before query rewriting. This approach contrasts with prior studies that predominantly use closed-source LLMs to directly generate search queries from conversation history. We demonstrate on five well-established benchmarks that CHIQ leads to state-of-the-art results across most settings, showing highly competitive performances with systems leveraging closed-source LLMs. Our study provides a first step towards leveraging open-source LLMs in conversational search, as a competitive alternative to the prevailing reliance on commercial LLMs. Data, models, and source code will be publicly available upon acceptance at https://github.com/fengranMark/CHIQ.

PDF HTML Abstract

Contextual History Enhancement for Improving Query Rewriting in Conversational Search

The paper "CHIQ: Contextual History Enhancement for Improving Query Rewriting in Conversational Search" by Fengran Mo et al. presents an innovative approach to refining the task of query rewriting in conversational search systems. This methodology leverages the capabilities of open-source LLMs to tackle ambiguities and enhance contextual relevance within conversation histories.

Overview

The central thesis of the paper is the introduction of a two-step method termed CHIQ (Contextual History for Improving Query rewriting) which aims to enhance the performance of query rewriting by first improving the clarity and quality of conversation history. This approach contrasts with prior attempts that predominantly rely on closed-source LLMs to directly generate search queries. Through extensive experiments across five benchmark datasets, it is demonstrated that CHIQ achieves state-of-the-art performance, often surpassing the efficacy of systems utilizing commercial LLMs.

Methodology

Task Formulation

Conversational search systems engage users in multi-turn interactions to satisfy diverse information needs. Key to this interaction is accurately translating each user's utterance into effective search queries. The paper emphasizes the inherent ambiguity in conversation histories and proposes refining these histories to improve query generation significantly.

History Enhancement

To refine conversation history, CHIQ employs five specialized prompts:

Question Disambiguation (QD): This prompt aims to clarify acronyms, ambiguous words, and coreference substitutes in user questions, making them self-contained.
Response Expansion (RE): This prompt enriches the previous responses, making them more informative by leveraging preceding conversational context.
Pseudo Response (PR): Here, an LLM speculates potential responses, adding possibly relevant terms to assist in better understanding the conversation context.
Topic Switch (TS): This prompt identifies shifts in conversation topics, omitting irrelevant past turns to maintain focus.
History Summary (HS): It compresses lengthy history into concise, relevant summaries, maintaining essential contextual elements.

Ad-hoc Query Rewriting and Fine-Tuning

The enhanced history is utilized in query rewriting either directly (CHIQ-AD) or through fine-tuning a smaller LLM like T5-base (CHIQ-FT). Additionally, the CHIQ-Fusion method aggregates results from both approaches to further refine search outcomes.

Experimental Results

The efficacy of CHIQ was validated on multiple benchmarks: TopiOCQA, QReCC, and three CAsT datasets. The following observations were made:

Performance Gains: CHIQ-AD and CHIQ-FT significantly improved upon baselines using the original history, showcasing the importance of enhanced history for better query rewriting.
Zero-shot Performance: CHIQ approaches exhibited strong generalization capabilities, outperforming many existing methods in zero-shot settings.
Open-source vs. Closed-source: While closed-source LLMs like ChatGPT-3.5 showed improved performance, CHIQ narrowed the gap significantly, indicating the potential of open-source models when enhanced history is employed.

Implications and Future Work

The practical implications of this research are substantial. By refining conversation history, the proposed methodologies enable open-source LLMs to deliver competitive or superior performance in query rewriting tasks, reducing dependency on commercial models. This democratization of high-performance conversational search systems has potential benefits in terms of accessibility and cost.

Future developments could explore larger open-source models like Mixtral or Gemma, additional closed-source models like GPT-4 or Claude, or integrating more sophisticated filtering strategies to handle noisy LLM outputs. Improving query rewriting and retrieval alignment through hybrid approaches, which interpolate between human and pseudo-supervised signals, could further enhance the system’s robustness.

In conclusion, the CHIQ methodology represents a systematic, highly effective approach to enhancing query rewriting in conversational search, leveraging the strengths of open-source LLMs. This foundational work sets the stage for substantial advancements in the deployment and performance of conversational search systems.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Fengran Mo (35 papers)
Abbas Ghaddar (18 papers)
Kelong Mao (23 papers)
Mehdi Rezagholizadeh (78 papers)
Boxing Chen (67 papers)
Qun Liu (230 papers)
Jian-Yun Nie (70 papers)

Citations (5)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - fengranMark/CHIQ

Tweets

https://twitter.com/_reachsumit/status/1800003047592267971