Latent Retrieval for Weakly Supervised Open Domain Question Answering (1906.00300v3)

Published 1 Jun 2019 in cs.CL

Abstract: Recent work on open domain question answering (QA) assumes strong supervision of the supporting evidence and/or assumes a blackbox information retrieval (IR) system to retrieve evidence candidates. We argue that both are suboptimal, since gold evidence is not always available, and QA is fundamentally different from IR. We show for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system. In this setting, evidence retrieval from all of Wikipedia is treated as a latent variable. Since this is impractical to learn from scratch, we pre-train the retriever with an Inverse Cloze Task. We evaluate on open versions of five QA datasets. On datasets where the questioner already knows the answer, a traditional IR system such as BM25 is sufficient. On datasets where a user is genuinely seeking an answer, we show that learned retrieval is crucial, outperforming BM25 by up to 19 points in exact match.

PDF Abstract

Latent Retrieval for Weakly Supervised Open Domain Question Answering

This paper, authored by Kenton Lee, Ming-Wei Chang, and Kristina Toutanova from Google Research, explores the intricacies of open domain question answering (QA) systems and introduces a novel approach dubbed Open-Retrieval Question Answering (ORQA). The key innovation of this work lies in jointly learning both the evidence retriever and the reader from question-answer pairs without strong supervision or reliance on a blackbox information retrieval (IR) system. This approach counters the traditional methodologies that either assume the availability of gold-standard evidence or depend on an IR system for evidence retrieval, despite its inability to be fine-tuned for the QA task.

Methodology

The paper's core contribution is the introduction of a latent variable model where evidence retrieval is treated as latent. The impracticality of training such a model from scratch is addressed by pre-training the retriever with an Inverse Cloze Task (ICT). ICT approximates the retrieval objective by functioning analogous to evidence retrieval but in an unsupervised manner. During ICT, a sentence serves as a pseudo-question, and its surrounding context acts as pseudo-evidence. This pre-training empowers the system with an initialization robust enough for effective end-to-end fine-tuning on question-answer pairs.

To formally define the components:

Retriever Component: Uses dense vector representations of the question $q$ and an evidence block $b$ , computed as inner products of their respective BERT encodings.
Reader Component: Utilizes span-based scoring, informed by BERT representations, to identify the correct answer within the retrieved evidence block.

Experimental Setup and Results

The evaluation was performed on five publicly available QA datasets adapted for open-domain use. These include Natural Questions, WebQuestions, CuratedTrec, TriviaQA, and SQuAD. The methodology particularly shines on datasets where the questions reflect genuine information-seeking behavior (Natural Questions, WebQuestions, CuratedTrec), with ORQA outperforming BM25 by significant margins (6 to 19 points in exact match). However, for datasets like TriviaQA and SQuAD, where the question writers know the answers in advance, traditional IR methods like BM25 still provide competitive retrieval performance due to the distinct nature of such tasks resembling IR problems.

Implications and Future Directions

The implications of this research are manifold. Practically, ORQA sets a precedent for developing more adaptable and fine-tunable QA systems by leveraging weak supervision exclusively from question-answer pairs. This could streamline the creation of QA systems in domains where gold-standard evidence is sparse or unavailable. Theoretically, this work pushes the boundary on the utility of unsupervised pre-training tasks like ICT in initializing latent variable models for complex upstream tasks like QA.

Future research could explore hybrid approaches combining the strengths of BM25 with neural retrievers to balance the precision of sparse word-matching with the semantic understanding of dense representations. Further, additional studies could refine the retrieval component by integrating more sophisticated multi-step retrieval mechanisms and other advanced pre-training tasks to enhance the initializations even further.

In conclusion, the authors' efforts in developing ORQA mark a significant step toward more flexible and intelligent open domain QA systems, driven by latent retrieval and unsupervised pre-training methodologies. The performance gains on information-seeking datasets underscore the value of learned retrieval mechanisms tailored for QA, paving the way for future advancements in the field.

This essay encapsulates the primary contributions, methodology, and implications detailed in the "Latent Retrieval for Weakly Supervised Open Domain Question Answering" paper, intended for an audience of experienced researchers in computer science and artificial intelligence.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Kenton Lee (40 papers)
Ming-Wei Chang (44 papers)
Kristina Toutanova (31 papers)

Citations (931)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos