Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 131 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 71 tok/s Pro
Kimi K2 192 tok/s Pro
GPT OSS 120B 385 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Simple and Effective Multi-Paragraph Reading Comprehension (1710.10723v2)

Published 29 Oct 2017 in cs.CL

Abstract: We consider the problem of adapting neural paragraph-level question answering models to the case where entire documents are given as input. Our proposed solution trains models to produce well calibrated confidence scores for their results on individual paragraphs. We sample multiple paragraphs from the documents during training, and use a shared-normalization training objective that encourages the model to produce globally correct output. We combine this method with a state-of-the-art pipeline for training models on document QA data. Experiments demonstrate strong performance on several document QA datasets. Overall, we are able to achieve a score of 71.3 F1 on the web portion of TriviaQA, a large improvement from the 56.7 F1 of the previous best system.

Citations (447)

Summary

  • The paper introduces a method combining TF-IDF-based paragraph selection with a shared-normalization training objective for improved document-level comprehension.
  • It achieves a significant 15-point F1 increase on the TriviaQA dataset, outperforming earlier models across both verified and unfiltered settings.
  • The approach scales efficiently by marginalizing answer candidate probabilities, offering practical insights for large-scale neural question-answering systems.

Multi-Paragraph Reading Comprehension: A Study on Scalability and Efficiency

This paper addresses a significant challenge in natural language processing: adapting neural models from paragraph-level to document-level reading comprehension. The authors propose a method that uses calibrated confidence scores across multiple paragraphs, achieving commendable results, particularly on the TriviaQA dataset.

Problem Statement

The transition from paragraph-level to document-level question answering (QA) is fraught with computational demands. Traditional methods either attempt to select a single paragraph for detailed analysis or apply models to multiple paragraphs and rely on confidence scores for answer extraction. However, naive approaches to training can result in non-comparable confidence scores across paragraphs.

Methodology and Innovations

The authors introduce an approach combining TF-IDF-based paragraph selection with a shared-normalization training objective. This novel combination allows the model to maintain globally consistent output across various paragraphs. By marginalizing answer candidate probabilities across paragraphs sampled from the same document, their method promotes the production of comparable confidence scores without requiring direct paragraph interactions during processing.

Key Model Features

  • TF-IDF Paragraph Selection: Selects paragraphs based on cosine similarity, improving the likelihood of including relevant content.
  • Summed Objective Function: Handles distantly supervised data by marginalizing over all possible answer spans, mitigating noisy label impacts.
  • Self-Attention and Bi-Directional Attention: Integrates recent advances in reading comprehension to improve context representation.

Results and Evaluation

The paper reports impressive improvements in QA performance benchmarks:

  • TriviaQA Web: Achieves 71.3 F1, significantly surpassing prior models with a 15-point F1 increase.
  • Generalization Across Datasets: Demonstrates robustness on both the verified and unfiltered TriviaQA datasets, outperforming existing methods by a substantial margin.

The use of shared-normalization is particularly notable, which excels when documents are highly relevant. While the model without training adaptations struggles with scalability, shared-normalization maintains efficiency even when processing large text volumes.

Theoretical and Practical Implications

The proposed approach provides theoretical insights into scalable methods for extending paragraph-level models to document-level tasks. Practically, this allows for more efficient deployment of neural QA systems in real-world applications where large volumes of text must be processed without substantial computational overhead.

Future Directions

The research broadens the horizon for deploying reading comprehension models in open-domain question answering settings. Future work could explore integrating this method with more advanced machine reading models or assessing its efficacy across diverse data sources.

This work sets a robust standard in multi-paragraph reading comprehension, highlighting the benefits of well-calibrated modeling techniques in processing complex textual inputs. It represents a meaningful step forward in building scalable and effective AI systems for extracting information from extensive documents.

Dice Question Streamline Icon: https://streamlinehq.com

Open Questions

We haven't generated a list of open questions mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub