Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
The paper by Gautier Izacard and Edouard Grave presents a thorough investigation into the utility of combining generative models with passage retrieval techniques for open domain question answering (QA). Traditionally, generative models in this field have not leveraged external knowledge sources, relying instead on large parametric models. This approach, despite being competitive, incurs high computational costs due to the substantial parameter count required to store extensive knowledge internally. The authors propose an efficient and effective alternative by integrating external text passage retrieval into the generative model framework.
Methodology
The proposed method, termed Fusion-in-Decoder, is executed in two primary stages:
- Retrieval: Relevant passages are retrieved from external knowledge sources, such as Wikipedia. Two retrieval methods are compared: BM25, which utilizes sparse term and inverse document frequencies and Dense Passage Retrieval (DPR), which relies on dense vector representations obtained from BERT networks.
- Generative Modeling: The retrieved passages, along with the query, are input into a sequence-to-sequence model. Specifically, the authors adopt T5, a pretrained model known for its efficacy in various NLP tasks. The encoder processes each passage independently, while the decoder performs cross-attention over the concatenated representations from all passages, enabling effective evidence aggregation.
Evaluation and Results
The authors evaluate their method on NaturalQuestions (NQ) and TriviaQA, obtaining state-of-the-art results on these benchmarks. Notably, Fusion-in-Decoder significantly improves performance as the number of retrieved passages increases, demonstrating the model’s capability to efficiently use large amounts of textual evidence. For instance, the large version of their model achieves an exact match (EM) of 51.4% on NQ and 67.6% on TriviaQA, outperforming existing methods, including both generative and extractive counterparts.
Comparative Analysis
The efficacy of the proposed method is underscored by its performance relative to existing state-of-the-art techniques. Compared to T5 with 11 billion parameters, which obtains 36.6% EM on NQ, the proposed model achieves 44.1% with 770 million parameters when coupled with Wikipedia passages retrieved using BM25. This indicates that explicit text-based memory can be a competitive alternative to large parametric models.
Practical and Theoretical Implications
The practical implications of this work are significant. The model demonstrates that leveraging external knowledge can reduce the parameter size and, correspondingly, the computational resources needed to achieve high performance in open domain QA tasks. This approach offers a scalable solution, showing continual performance improvements as more passages are retrieved and considered in the generative process.
From a theoretical perspective, this work provides robust evidence that sequence-to-sequence models can effectively combine and aggregate evidence from multiple passages, a task where traditional extractive models often struggle. This insight opens avenues for further research into generative models' ability to handle extensive external knowledge bases efficiently.
Future Work
The authors propose several directions for future work, including improving efficiency when scaling to a large number of support passages and integrating retrieval into the model to enable end-to-end learning. These enhancements could further streamline the process and potentially boost the performance and applicability of generative models in open domain question answering tasks.
Conclusion
This paper introduces a compelling approach to augmenting generative models for open domain QA with passage retrieval, demonstrating notable improvements in accuracy and efficiency. The findings highlight the flexibility and effectiveness of sequence-to-sequence models in aggregating multiple sources of textual evidence, setting a new benchmark in the field.