I Could've Asked That: Reformulating Unanswerable Questions

Published 24 Jul 2024 in cs.CL and cs.AI | (2407.17469v1)

Abstract: When seeking information from unfamiliar documents, users frequently pose questions that cannot be answered by the documents. While existing LLMs identify these unanswerable questions, they do not assist users in reformulating their questions, thereby reducing their overall utility. We curate CouldAsk, an evaluation benchmark composed of existing and new datasets for document-grounded question answering, specifically designed to study reformulating unanswerable questions. We evaluate state-of-the-art open-source and proprietary LLMs on CouldAsk. The results demonstrate the limited capabilities of these models in reformulating questions. Specifically, GPT-4 and Llama2-7B successfully reformulate questions only 26% and 12% of the time, respectively. Error analysis shows that 62% of the unsuccessful reformulations stem from the models merely rephrasing the questions or even generating identical questions. We publicly release the benchmark and the code to reproduce the experiments.

Abstract PDF HTML Upgrade to Chat

Citations (1)

View on Semantic Scholar

Summary

The paper introduces CouldAsk, a benchmark for evaluating and reformulating unanswerable questions in document-grounded QA.
It reveals that leading models like GPT-4 only achieve a 26% success rate in meaningful reformulations, underscoring current limitations.
The study explores strategies to correct presuppositions and reduce ambiguity, emphasizing the need for advanced compositional reasoning in LLMs.

Reformulating Unanswerable Questions in Document-Grounded Question Answering

The research paper titled "I Could've Asked That: Reformulating Unanswerable Questions" provides a critical analysis of a significant hurdle in document-grounded question answering utilizing LLMs. Specifically, it addresses the pervasive issue of unanswerable questions and the consequent challenges in their reformulation. Despite the effectiveness of current LLMs in recognizing such questions, their utility often falls short as they do not aid users in modifying their inquiries into answerable ones. This limitation represents a substantial impediment to extracting relevant information from complex documents.

To tackle this, the authors propose "CouldAsk," a novel evaluation benchmark specifically designed to assess and facilitate the reformulation of unanswerable questions in document-grounded settings. This benchmark is meticulously curated, incorporating both existing datasets and newly synthesized data to ensure a comprehensive coverage of various domains and contexts. The evaluation focuses on two primary tasks: detection of unanswerable questions and their subsequent reformulation.

The research highlights that leading LLMs, including GPT-4 and Llama2-7B, exhibit limited performance in question reformulation, with success rates of only 26% and 12%, respectively. A significant finding from the error analysis reveals that 62% of failed reformulations involve models merely rephrasing questions without altering their substantive content or generating identical queries outright. This emphasizes the models' inadequacies in understanding and modifying the underlying assumptions of the queries.

Several strategies are explored for reformulating questions based on presupposition errors, which are either contradictory or unverifiable within the provided documents. The paper identifies key reformulation strategies, including structuring questions to correct presuppositions, generalizing inquiries, seeking nearest match questions, or specifying the questions to reduce ambiguity.

Comprehensive experiments are conducted using models like GPT-3.5 and various open-source alternatives, both in zero-shot and few-shot settings, to explore their capabilities in unanswerable question detection and reformulation. Results indicate varied efficacy across different models, with GPT-4 generally outperforming others in recognizing unanswerable questions.

The implications of this research are manifold. Practically, improving reformulation capabilities can significantly enhance user interactions with AI in document comprehension applications, such as legal and medical records analysis. Theoretically, this work underscores the need for advanced models capable of understanding deep semantic aspects of language beyond mere syntactical reformation.

Future developments in AI, as suggested by this study, could focus on enhancing the compositional reasoning abilities of LLMs. Enhancing these capabilities will involve more sophisticated mechanisms for understanding the context and intent behind questions, ensuring that models not only recognize the unanswerability of a query but also offer meaningful and contextually appropriate alternatives.

In conclusion, addressing the reformulation of unanswerable questions is imperative for the continued development of effective document-grounded question answering systems. This research lays a necessary groundwork, inviting further innovations in AI that can adeptly understand and interact with complex linguistic structures inherent in human queries.

Markdown