Latent Retrieval for Weakly Supervised Open Domain Question Answering
This paper, authored by Kenton Lee, Ming-Wei Chang, and Kristina Toutanova from Google Research, explores the intricacies of open domain question answering (QA) systems and introduces a novel approach dubbed Open-Retrieval Question Answering (ORQA). The key innovation of this work lies in jointly learning both the evidence retriever and the reader from question-answer pairs without strong supervision or reliance on a blackbox information retrieval (IR) system. This approach counters the traditional methodologies that either assume the availability of gold-standard evidence or depend on an IR system for evidence retrieval, despite its inability to be fine-tuned for the QA task.
Methodology
The paper's core contribution is the introduction of a latent variable model where evidence retrieval is treated as latent. The impracticality of training such a model from scratch is addressed by pre-training the retriever with an Inverse Cloze Task (ICT). ICT approximates the retrieval objective by functioning analogous to evidence retrieval but in an unsupervised manner. During ICT, a sentence serves as a pseudo-question, and its surrounding context acts as pseudo-evidence. This pre-training empowers the system with an initialization robust enough for effective end-to-end fine-tuning on question-answer pairs.
To formally define the components:
- Retriever Component: Uses dense vector representations of the question and an evidence block , computed as inner products of their respective BERT encodings.
- Reader Component: Utilizes span-based scoring, informed by BERT representations, to identify the correct answer within the retrieved evidence block.
Experimental Setup and Results
The evaluation was performed on five publicly available QA datasets adapted for open-domain use. These include Natural Questions, WebQuestions, CuratedTrec, TriviaQA, and SQuAD. The methodology particularly shines on datasets where the questions reflect genuine information-seeking behavior (Natural Questions, WebQuestions, CuratedTrec), with ORQA outperforming BM25 by significant margins (6 to 19 points in exact match). However, for datasets like TriviaQA and SQuAD, where the question writers know the answers in advance, traditional IR methods like BM25 still provide competitive retrieval performance due to the distinct nature of such tasks resembling IR problems.
Implications and Future Directions
The implications of this research are manifold. Practically, ORQA sets a precedent for developing more adaptable and fine-tunable QA systems by leveraging weak supervision exclusively from question-answer pairs. This could streamline the creation of QA systems in domains where gold-standard evidence is sparse or unavailable. Theoretically, this work pushes the boundary on the utility of unsupervised pre-training tasks like ICT in initializing latent variable models for complex upstream tasks like QA.
Future research could explore hybrid approaches combining the strengths of BM25 with neural retrievers to balance the precision of sparse word-matching with the semantic understanding of dense representations. Further, additional studies could refine the retrieval component by integrating more sophisticated multi-step retrieval mechanisms and other advanced pre-training tasks to enhance the initializations even further.
In conclusion, the authors' efforts in developing ORQA mark a significant step toward more flexible and intelligent open domain QA systems, driven by latent retrieval and unsupervised pre-training methodologies. The performance gains on information-seeking datasets underscore the value of learned retrieval mechanisms tailored for QA, paving the way for future advancements in the field.
This essay encapsulates the primary contributions, methodology, and implications detailed in the "Latent Retrieval for Weakly Supervised Open Domain Question Answering" paper, intended for an audience of experienced researchers in computer science and artificial intelligence.