The Web as a Knowledge-base for Answering Complex Questions (1803.06643v1)

Published 18 Mar 2018 in cs.CL, cs.AI, and cs.LG

Abstract: Answering complex questions is a time-consuming activity for humans that requires reasoning and integration of information. Recent work on reading comprehension made headway in answering simple questions, but tackling complex questions is still an ongoing research challenge. Conversely, semantic parsers have been successful at handling compositionality, but only when the information resides in a target knowledge-base. In this paper, we present a novel framework for answering broad and complex questions, assuming answering simple questions is possible using a search engine and a reading comprehension model. We propose to decompose complex questions into a sequence of simple questions, and compute the final answer from the sequence of answers. To illustrate the viability of our approach, we create a new dataset of complex questions, ComplexWebQuestions, and present a model that decomposes questions and interacts with the web to compute an answer. We empirically demonstrate that question decomposition improves performance from 20.8 precision@1 to 27.5 precision@1 on this new dataset.

Authors (2)

Alon Talmor (13 papers)
Jonathan Berant (107 papers)

Citations (504)

View on Semantic Scholar

Summary

The paper introduces a novel QA framework that decomposes complex questions into simpler queries, leveraging web search and symbolic operations.
It employs a sequence-to-sequence model with pointer networks and automated alignment to effectively manage weak supervision.
Results demonstrate an increase in precision@1 from 20.8% to 27.5%, underlining the benefits of question decomposition across multiple models.

The Web as a Knowledge-base for Answering Complex Questions

This paper addresses the significant challenge of answering complex questions that require reasoning and integration of information from multiple sources. The authors propose a novel framework utilizing the web as a knowledge-base, decomposing complex questions into sequences of simple questions which can be addressed using a search engine and reading comprehension models.

Introduction and Framework

Complex question answering (QA) demands reasoning beyond the capabilities of many current reading comprehension (RC) models, which excel primarily in matching questions to local contexts but struggle with broader reasoning. Traditional approaches leveraging semantic parsing depend heavily on pre-constructed knowledge-bases (KBs), limiting their applicational scope. This paper introduces a broad and compositional QA framework that bypasses these limitations.

The framework decomposes complex questions into simpler parts, processes them individually, and integrates the results using web data. This compositional approach is depicted as transforming the initial complex question into a sequence of simpler questions. These are then utilized with a search engine to identify answers, which are subsequently synthesized using symbolic operations such as union and intersection. The model eschews dependency on a specific KB or documents retrieved in advance, thereby increasing flexibility.

Dataset Creation: ComplexWebQuestions

The authors develop a new dataset, ComplexWebQuestions, designed to test their framework. This dataset is crafted by manipulating existing question-answer pairs with SPARQL queries to generate complex questions involving function composition, conjunctions, superlatives, and comparatives. Data collection involves using Amazon Mechanical Turk to generate natural language versions of these machine-generated questions, resulting in a dataset of 34,689 question-answer pairs.

Model: Sequence-to-sequence Architecture

The authors present a sequence-to-sequence architecture aimed at decomposing complex questions through a computation tree that sequentially tackles question components. Using a pointer network, the model identifies points within questions to split them into manageable parts, referring to web sources for solving these components.

Supervision is automated by aligning machine-generated questions to human paraphrased ones, allowing the model to learn from noisy, yet useful, alignments without requiring extensive manual annotations.

Empirical Results and Analysis

The proposed method shows a marked increase in precision@1 from 20.8 to 27.5, highlighting the efficacy of question decomposition. An important finding is that decomposing questions significantly benefits performance across different RC models, illustrating its utility beyond specific implementations.

Human evaluators reach 63.0 precision@1 under time constraints, indicating current machine models have significant room for improvement. A noteworthy observation is that 22% of complex questions need substantial paraphrasing beyond what is typically encountered in simpler datasets.

Implications and Future Directions

This research posits that effectively harnessing the massive, albeit unstructured, information available on the web can augment traditional QA models, particularly for complex questions. The decomposition approach's success suggests further exploration into models that dynamically interact with web-based information sources and potentially integrate other forms of structured data, like tables and KBs.

Future work involves refining models to learn from weak supervision directly and expanding functionality to handle a broader range of logical functions. This paper's introduction of both a novel dataset and a decomposition-based approach lays foundational work that could significantly influence future advancements in QA systems. The complex interplay between decomposition and information retrieval showcased here points towards a promising avenue for enhancing AI's reasoning capabilities.

PDF Markdown