DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension (1804.07927v4)

Published 21 Apr 2018 in cs.CL

Abstract: We propose DuoRC, a novel dataset for Reading Comprehension (RC) that motivates several new challenges for neural approaches in language understanding beyond those offered by existing RC datasets. DuoRC contains 186,089 unique question-answer pairs created from a collection of 7680 pairs of movie plots where each pair in the collection reflects two versions of the same movie - one from Wikipedia and the other from IMDb - written by two different authors. We asked crowdsourced workers to create questions from one version of the plot and a different set of workers to extract or synthesize answers from the other version. This unique characteristic of DuoRC where questions and answers are created from different versions of a document narrating the same underlying story, ensures by design, that there is very little lexical overlap between the questions created from one version and the segments containing the answer in the other version. Further, since the two versions have different levels of plot detail, narration style, vocabulary, etc., answering questions from the second version requires deeper language understanding and incorporating external background knowledge. Additionally, the narrative style of passages arising from movie plots (as opposed to typical descriptive passages in existing datasets) exhibits the need to perform complex reasoning over events across multiple sentences. Indeed, we observe that state-of-the-art neural RC models which have achieved near human performance on the SQuAD dataset, even when coupled with traditional NLP techniques to address the challenges presented in DuoRC exhibit very poor performance (F1 score of 37.42% on DuoRC v/s 86% on SQuAD dataset). This opens up several interesting research avenues wherein DuoRC could complement other RC datasets to explore novel neural approaches for studying language understanding.

Citations (189)

View on Semantic Scholar

Summary

The paper presents the DuoRC dataset built from dual movie plots, significantly reducing lexical overlap to enforce advanced semantic comprehension.
The experimental evaluation reveals a stark performance drop in RC models, exemplified by BiDAF’s F1 score falling from 86% on SQuAD to 37.42% on DuoRC.
The study underscores the need for integrating external knowledge and sophisticated reasoning techniques to enhance machine narrative understanding.

An Overview of DuoRC: A Novel Dataset for Enhanced Language Comprehension

The paper entitled "DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension" introduces a novel dataset designed to advance the capabilities of reading comprehension (RC) models. It addresses some of the key limitations of existing datasets by focusing on deeper language understanding, paraphrasing challenges, and the necessity of using external knowledge for comprehension tasks. The DuoRC dataset is constructed from two distinct narrative versions of 7680 movie plots, resulting in 186,089 unique question-answer pairs. These data points are thoughtfully curated by having questions derived from one version of the plot and answers extracted or synthesized from another, minimizing lexical overlap and prompting advanced reasoning.

Key Contributions and Challenges of DuoRC

Low Lexical Overlap: Unlike most existing RC datasets, DuoRC ensures minimal lexical overlap between questions and corresponding passages. This significant design decision necessitates models to perform more sophisticated semantic understanding rather than relying on superficial word matching.
Utilization of Background and Common-Sense Knowledge: Answering the questions posed in the DuoRC dataset often requires models to employ external knowledge and reasoning capabilities, challenging them to go beyond the given textual context.
Narrative Passage Complexity: Featuring narrative content from movie plots introduces complexities not present in factual descriptive passages of traditional datasets. This includes events with causal linkages across sentences which demand intricate reasoning and discourse understanding from the models.
Handling Unanswerable Questions: The dataset includes questions that appear relevant but cannot be answered with the provided passage, pushing models to recognize when information is insufficient and reinforcing the importance of identifying unanswerability in practical applications.

Experimental Evaluation

The paper evaluates state-of-the-art RC models using the DuoRC dataset, noting significantly reduced performance compared to established datasets like SQuAD. Particularly, the BiDAF model exhibited an F1 score of 37.42% on DuoRC compared to 86% on SQuAD. This stark contrast highlights the increased complexity and challenges introduced by DuoRC. The reduced performance underscores areas where current neural RC models fall short and emphasizes the need for innovation in processing narrative language and cross-reference reasoning.

Implications for Future Research

DuoRC opens several avenues for future advancements in AI language comprehension. While existing models struggle, DuoRC holds promise as a benchmark for developing novel approaches that incorporate deeper semantic processing. The unique characteristics of the dataset offer a testbed for models that integrate knowledge graphs, coreference resolution, and more sophisticated language reasoning. By extending beyond simple factual data, DuoRC sets the stage for next-generation LLMs capable of comprehensive understanding across various contexts.

Conclusion

In conclusion, DuoRC is a valuable contribution to the field of natural language understanding, crafted to challenge current capabilities and drive research toward more advanced comprehension models. By fostering a complex environment that mimics real-world challenges in narrative understanding and cross-document reasoning, it not only complements existing datasets but also paves the way for meaningful progress in AI-driven language processing. Researchers are encouraged to leverage DuoRC alongside traditional datasets to broaden the spectrum of language comprehension achievements.

PDF Markdown