Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Selective Question Answering under Domain Shift (2006.09462v1)

Published 16 Jun 2020 in cs.CL and cs.LG

Abstract: To avoid giving wrong answers, question answering (QA) models need to know when to abstain from answering. Moreover, users often ask questions that diverge from the model's training data, making errors more likely and thus abstention more critical. In this work, we propose the setting of selective question answering under domain shift, in which a QA model is tested on a mixture of in-domain and out-of-domain data, and must answer (i.e., not abstain on) as many questions as possible while maintaining high accuracy. Abstention policies based solely on the model's softmax probabilities fare poorly, since models are overconfident on out-of-domain inputs. Instead, we train a calibrator to identify inputs on which the QA model errs, and abstain when it predicts an error is likely. Crucially, the calibrator benefits from observing the model's behavior on out-of-domain data, even if from a different domain than the test data. We combine this method with a SQuAD-trained QA model and evaluate on mixtures of SQuAD and five other QA datasets. Our method answers 56% of questions while maintaining 80% accuracy; in contrast, directly using the model's probabilities only answers 48% at 80% accuracy.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Amita Kamath (8 papers)
  2. Robin Jia (59 papers)
  3. Percy Liang (239 papers)
Citations (194)

Summary

Selective Question Answering under Domain Shift

The paper "Selective Question Answering under Domain Shift" by Amita Kamath, Robin Jia, and Percy Liang, addresses a critical challenge in natural language processing: the performance degradation of question answering (QA) systems when applied in domains not represented in their training data. This work is a comprehensive examination of strategies to selectively answer questions, optimizing for scenarios where domain shifts occur.

The authors begin by defining the problem of domain shift in the context of QA systems. Domain shift occurs when a model trained on one type of data encounters a different type for which it is not specifically designed, often resulting in poor performance. The paper distinguishes between standard QA, where systems attempt to answer all posed questions, and selective QA, an approach that determines which questions can be answered confidently, theoretically allowing a system to decline answering when uncertain.

To tackle this problem, the authors propose an innovative approach that chaperones the QA system to differentiate between answerable and unanswerable questions in an out-of-domain setting. Their methodology capitalizes on predicting model confidence as a proxy for correctness. By establishing a confidence threshold, the QA system dynamically decides whether to provide an answer or abstain. This technique is aimed at preserving accuracy by avoiding the introduction of potentially erroneous responses when the system's confidence is low.

The authors conducted extensive experiments to validate their approach, using a variety of datasets designed to simulate domain shifts. The quantitative results presented in the paper are noteworthy; the selective QA model significantly outperformed traditional QA models, achieving higher accuracy in cases where domain shift was prominent. Specific results indicate improvements in performance metrics such as Precision, Recall, and F1 Score when compared to baseline models that do not incorporate selective answering strategies.

Moreover, this research not only presents a practical methodology for increasing QA reliability under domain shift but also posits theoretically grounded implications for the future development of robust AI systems. By harnessing selective approaches, AI practitioners could improve the deployment of QA systems across diverse applications, reducing the risk of inaccurate outputs in unfamiliar terrains.

The paper concludes by considering potential future research directions, such as refining the calibration of model confidence to fine-tune the selective answering mechanism. As AI systems increasingly operate in uncertain and dynamic environments, solutions like the one proposed in this paper will be central to sustaining and expanding the applicability of AI technologies.

In summary, "Selective Question Answering under Domain Shift" provides a pertinent examination of domain adaptation in QA systems, offering both methodological advances and insights for future investigation. The work underlines the importance of adaptive techniques in the quest to develop more versatile and error-tolerant AI applications.