Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering (2007.01282v2)

Published 2 Jul 2020 in cs.CL and cs.LG

Abstract: Generative models for open domain question answering have proven to be competitive, without resorting to external knowledge. While promising, this approach requires to use models with billions of parameters, which are expensive to train and query. In this paper, we investigate how much these models can benefit from retrieving text passages, potentially containing evidence. We obtain state-of-the-art results on the Natural Questions and TriviaQA open benchmarks. Interestingly, we observe that the performance of this method significantly improves when increasing the number of retrieved passages. This is evidence that generative models are good at aggregating and combining evidence from multiple passages.

PDF Abstract

Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering

The paper by Gautier Izacard and Edouard Grave presents a thorough investigation into the utility of combining generative models with passage retrieval techniques for open domain question answering (QA). Traditionally, generative models in this field have not leveraged external knowledge sources, relying instead on large parametric models. This approach, despite being competitive, incurs high computational costs due to the substantial parameter count required to store extensive knowledge internally. The authors propose an efficient and effective alternative by integrating external text passage retrieval into the generative model framework.

Methodology

The proposed method, termed Fusion-in-Decoder, is executed in two primary stages:

Retrieval: Relevant passages are retrieved from external knowledge sources, such as Wikipedia. Two retrieval methods are compared: BM25, which utilizes sparse term and inverse document frequencies and Dense Passage Retrieval (DPR), which relies on dense vector representations obtained from BERT networks.
Generative Modeling: The retrieved passages, along with the query, are input into a sequence-to-sequence model. Specifically, the authors adopt T5, a pretrained model known for its efficacy in various NLP tasks. The encoder processes each passage independently, while the decoder performs cross-attention over the concatenated representations from all passages, enabling effective evidence aggregation.

Evaluation and Results

The authors evaluate their method on NaturalQuestions (NQ) and TriviaQA, obtaining state-of-the-art results on these benchmarks. Notably, Fusion-in-Decoder significantly improves performance as the number of retrieved passages increases, demonstrating the model’s capability to efficiently use large amounts of textual evidence. For instance, the large version of their model achieves an exact match (EM) of 51.4% on NQ and 67.6% on TriviaQA, outperforming existing methods, including both generative and extractive counterparts.

Comparative Analysis

The efficacy of the proposed method is underscored by its performance relative to existing state-of-the-art techniques. Compared to T5 with 11 billion parameters, which obtains 36.6% EM on NQ, the proposed model achieves 44.1% with 770 million parameters when coupled with Wikipedia passages retrieved using BM25. This indicates that explicit text-based memory can be a competitive alternative to large parametric models.

Practical and Theoretical Implications

The practical implications of this work are significant. The model demonstrates that leveraging external knowledge can reduce the parameter size and, correspondingly, the computational resources needed to achieve high performance in open domain QA tasks. This approach offers a scalable solution, showing continual performance improvements as more passages are retrieved and considered in the generative process.

From a theoretical perspective, this work provides robust evidence that sequence-to-sequence models can effectively combine and aggregate evidence from multiple passages, a task where traditional extractive models often struggle. This insight opens avenues for further research into generative models' ability to handle extensive external knowledge bases efficiently.

Future Work

The authors propose several directions for future work, including improving efficiency when scaling to a large number of support passages and integrating retrieval into the model to enable end-to-end learning. These enhancements could further streamline the process and potentially boost the performance and applicability of generative models in open domain question answering tasks.

Conclusion

This paper introduces a compelling approach to augmenting generative models for open domain QA with passage retrieval, demonstrating notable improvements in accuracy and efficiency. The findings highlight the flexibility and effectiveness of sequence-to-sequence models in aggregating multiple sources of textual evidence, setting a new benchmark in the field.