Hierarchical Question Answering for Long Documents (1611.01839v2)

Published 6 Nov 2016 in cs.CL

Abstract: We present a framework for question answering that can efficiently scale to longer documents while maintaining or even improving performance of state-of-the-art models. While most successful approaches for reading comprehension rely on recurrent neural networks (RNNs), running them over long documents is prohibitively slow because it is difficult to parallelize over sequences. Inspired by how people first skim the document, identify relevant parts, and carefully read these parts to produce an answer, we combine a coarse, fast model for selecting relevant sentences and a more expensive RNN for producing the answer from those sentences. We treat sentence selection as a latent variable trained jointly from the answer only using reinforcement learning. Experiments demonstrate the state of the art performance on a challenging subset of the Wikireading and on a new dataset, while speeding up the model by 3.5x-6.7x.

Citations (167)

View on Semantic Scholar

Summary

The paper introduces a hierarchical, coarse-to-fine framework for question answering that first selects relevant sentences using reinforcement learning and then generates answers from the selection using an RNN.
The framework demonstrates significant speed improvements (3.5x-6.7x) over previous methods on datasets like Wikireading and WikiSuggest, while achieving comparable or better performance metrics.
This modular architecture efficiently processes long documents or multiple documents by leveraging limited document structures and offers potential for future advancements in natural language understanding.

Overview of Coarse-to-Fine Question Answering for Long Documents

The paper presents a novel framework for question answering (QA) designed to efficiently handle long documents. This approach aims to maintain or enhance the performance of existing state-of-the-art models while significantly increasing processing speed. The authors address the challenge inherent in the use of recurrent neural networks (RNNs) for reading comprehension tasks, particularly their inefficiency in managing long sequences due to the limitations in parallelization.

Framework Description

The proposed method mimics human reading practices by initially skimming a document and identifying relevant sections, followed by a detailed examination of these selected portions to derive an answer. The model operates on a hierarchical basis, incorporating a "coarse-to-fine" strategy:

Sentence Selection: The system employs a fast, coarse model to pinpoint sentences pertinent to the query, treating sentence selection as a latent variable. This component is trained using reinforcement learning, leveraging the answer to optimize sentence selection.
Answer Generation: Subsequently, an expensive RNN model processes the selected sentences to generate the final answer. This part of the framework efficiently handles a designated number of tokens irrespective of the document's overall length, thereby accelerating the computation.

Experimental Evaluation and Results

The efficacy of this hierarchical model is assessed on a subset of the Wikireading dataset and a newly introduced dataset called WikiSuggest. The framework demonstrates an impressive 3.5x-6.7x speed improvement over previous models, with performance metrics equaling or surpassing those of the state-of-the-art methods at the time.

Significance and Implications

A critical aspect of the paper is the modular nature of the presented architecture, which can accommodate long-form documents or multiple documents simultaneously by utilizing limited document structures such as sentence boundaries. The findings indicate that sentence selection improves model performance significantly, particularly when the challenge lies in locating sentences containing the answer.

The approach not only offers practical advancements in computing efficiency for QA systems but also provides theoretical insights into document processing and comprehension models. By training sentence selection as a latent variable jointly with answer generation, the paper introduces a potentially influential direction in machine learning applications for natural language processing tasks.

Future Directions

The research opens avenues for further exploration in the field of QA from unstructured text, with possibilities such as integrating more complex document-level structures and extending the model's capabilities to process multiple documents concurrently. Such advancements are essential for the development of robust systems capable of answering sophisticated queries across extensive data volumes.

Overall, this paper contributes to the ongoing development within artificial intelligence applications focused on improving natural language understanding through efficient document encoding and precise information retrieval strategies.