AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs (2410.19692v1)

Published 25 Oct 2024 in cs.CL, cs.AI, and cs.IR

Abstract: Generating diverse and effective clarifying questions is crucial for improving query understanding and retrieval performance in open-domain conversational search (CS) systems. We propose AGENT-CQ (Automatic GENeration, and evaluaTion of Clarifying Questions), an end-to-end LLM-based framework addressing the challenges of scalability and adaptability faced by existing methods that rely on manual curation or template-based approaches. AGENT-CQ consists of two stages: a generation stage employing LLM prompting strategies to generate clarifying questions, and an evaluation stage (CrowdLLM) that simulates human crowdsourcing judgments using multiple LLM instances to assess generated questions and answers based on comprehensive quality metrics. Extensive experiments on the ClariQ dataset demonstrate CrowdLLM's effectiveness in evaluating question and answer quality. Human evaluation and CrowdLLM show that the AGENT-CQ - generation stage, consistently outperforms baselines in various aspects of question and answer quality. In retrieval-based evaluation, LLM-generated questions significantly enhance retrieval effectiveness for both BM25 and cross-encoder models compared to human-generated questions.

PDF Abstract

Overview of AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs

The paper presents AGENT-CQ, a comprehensive framework dedicated to enhancing the generation and evaluation of clarifying questions within Conversational Search (CS) systems using LLMs. Recognizing the pivotal role that clarifying questions play in improving query interpretation and retrieval efficiency, the authors propose an end-to-end system that leverages LLMs to address the limitations inherent in traditional methods dependent on manual curation or template-based frameworks.

Core Framework Components

AGENT-CQ is systematically divided into two distinct stages: generation and evaluation.

Generation Stage:
- Phase 1 (Question Generation): Utilizes two LLM-based strategies to generate clarifying questions:
  - Facet-based Approach: Relies on generating facets of a query followed by LLM-generated questions targeting these facets.
  - Temperature-variation-based Approach: Explores the generation of questions by adjusting the LLM’s temperature, thus controlling diversity.
- Phase 2 (Question Filtering): Filters questions based on relevance and clarification potential, ensuring only the top-ranked questions are retained.
- Phase 3 (User Response Simulation): Engages a parametric simulation to generate realistic user responses to the filtered questions, ensuring diverse and tailored interactions.
Evaluation Stage (CrowdLLM):
- Emulates a crowd of evaluators using multiple LLM instances to simulate varied human assessments, evaluating the questions and responses across defined metrics.

Experimental Outcomes

The experiments leverage the ClariQ dataset to validate CrowdLLM’s efficacy, demonstrating high correlation with human evaluators in most quality dimensions. Key findings include:

LLM-Generated vs. Human Questions: LLM-generated questions significantly improved retrieval performance over human-generated ones, especially with BM25 and BERT models.
Superior Strategy: The Temperature-variation approach, given its clarity and effectiveness, consistently outperformed other methods, offering clearer and more relevant clarifying questions.
AGENT-CQ Contributions:
- Provides a scalable methodology for generating and evaluating clarifying questions.
- Introduces CrowdLLM as an effective alternative to labor-intensive human evaluation.
- Compares various LLM-driven approaches for generating clarifying questions and modeling user interactions.

Implications and Future Directions

The innovative approach of using LLMs for generating and evaluating clarifying questions reveals broader possibilities for AI in conversational settings. It underscores the potential for LLMs to enhance IR systems by delivering dynamic and contextually rich clarifications, thus improving user experience and system performance.

Future research may be directed towards refining user simulation methods, integrating domain-specific knowledge into LLMs for improved specificity, and exploring LLM biases in evaluating alongside human feedback. Additionally, optimizing LLM integration for diverse retrieval models could yield further benefits in complex search environments.

The paper serves as a crucial reference in advancing the capabilities of conversational AI, establishing a foundation for robust, scalable, and efficient question generation and evaluation frameworks within CS systems.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Clemencia Siro (15 papers)
Yifei Yuan (37 papers)
Mohammad Aliannejadi (85 papers)
Maarten de Rijke (261 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/cnsiro/status/1851281627449552901

https://twitter.com/_reachsumit/status/1881964064521568306

https://twitter.com/ashebytes/status/1850953287178338624