Simple and Effective Multi-Paragraph Reading Comprehension (1710.10723v2)

Published 29 Oct 2017 in cs.CL

Abstract: We consider the problem of adapting neural paragraph-level question answering models to the case where entire documents are given as input. Our proposed solution trains models to produce well calibrated confidence scores for their results on individual paragraphs. We sample multiple paragraphs from the documents during training, and use a shared-normalization training objective that encourages the model to produce globally correct output. We combine this method with a state-of-the-art pipeline for training models on document QA data. Experiments demonstrate strong performance on several document QA datasets. Overall, we are able to achieve a score of 71.3 F1 on the web portion of TriviaQA, a large improvement from the 56.7 F1 of the previous best system.

Citations (447)

View on Semantic Scholar

Summary

The paper introduces a method combining TF-IDF-based paragraph selection with a shared-normalization training objective for improved document-level comprehension.
It achieves a significant 15-point F1 increase on the TriviaQA dataset, outperforming earlier models across both verified and unfiltered settings.
The approach scales efficiently by marginalizing answer candidate probabilities, offering practical insights for large-scale neural question-answering systems.

Multi-Paragraph Reading Comprehension: A Study on Scalability and Efficiency

This paper addresses a significant challenge in natural language processing: adapting neural models from paragraph-level to document-level reading comprehension. The authors propose a method that uses calibrated confidence scores across multiple paragraphs, achieving commendable results, particularly on the TriviaQA dataset.

Problem Statement

The transition from paragraph-level to document-level question answering (QA) is fraught with computational demands. Traditional methods either attempt to select a single paragraph for detailed analysis or apply models to multiple paragraphs and rely on confidence scores for answer extraction. However, naive approaches to training can result in non-comparable confidence scores across paragraphs.

Methodology and Innovations

The authors introduce an approach combining TF-IDF-based paragraph selection with a shared-normalization training objective. This novel combination allows the model to maintain globally consistent output across various paragraphs. By marginalizing answer candidate probabilities across paragraphs sampled from the same document, their method promotes the production of comparable confidence scores without requiring direct paragraph interactions during processing.

Key Model Features

TF-IDF Paragraph Selection: Selects paragraphs based on cosine similarity, improving the likelihood of including relevant content.
Summed Objective Function: Handles distantly supervised data by marginalizing over all possible answer spans, mitigating noisy label impacts.
Self-Attention and Bi-Directional Attention: Integrates recent advances in reading comprehension to improve context representation.

Results and Evaluation

The paper reports impressive improvements in QA performance benchmarks:

TriviaQA Web: Achieves 71.3 F1, significantly surpassing prior models with a 15-point F1 increase.
Generalization Across Datasets: Demonstrates robustness on both the verified and unfiltered TriviaQA datasets, outperforming existing methods by a substantial margin.

The use of shared-normalization is particularly notable, which excels when documents are highly relevant. While the model without training adaptations struggles with scalability, shared-normalization maintains efficiency even when processing large text volumes.

Theoretical and Practical Implications

The proposed approach provides theoretical insights into scalable methods for extending paragraph-level models to document-level tasks. Practically, this allows for more efficient deployment of neural QA systems in real-world applications where large volumes of text must be processed without substantial computational overhead.

Future Directions

The research broadens the horizon for deploying reading comprehension models in open-domain question answering settings. Future work could explore integrating this method with more advanced machine reading models or assessing its efficacy across diverse data sources.

This work sets a robust standard in multi-paragraph reading comprehension, highlighting the benefits of well-calibrated modeling techniques in processing complex textual inputs. It represents a meaningful step forward in building scalable and effective AI systems for extracting information from extensive documents.