Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning (1909.00277v2)

Published 31 Aug 2019 in cs.CL and cs.AI

Abstract: Understanding narratives requires reading between the lines, which in turn, requires interpreting the likely causes and effects of events, even when they are not mentioned explicitly. In this paper, we introduce Cosmos QA, a large-scale dataset of 35,600 problems that require commonsense-based reading comprehension, formulated as multiple-choice questions. In stark contrast to most existing reading comprehension datasets where the questions focus on factual and literal understanding of the context paragraph, our dataset focuses on reading between the lines over a diverse collection of people's everyday narratives, asking such questions as "what might be the possible reason of ...?", or "what would have happened if ..." that require reasoning beyond the exact text spans in the context. To establish baseline performances on Cosmos QA, we experiment with several state-of-the-art neural architectures for reading comprehension, and also propose a new architecture that improves over the competitive baselines. Experimental results demonstrate a significant gap between machine (68.4%) and human performance (94%), pointing to avenues for future research on commonsense machine comprehension. Dataset, code and leaderboard is publicly available at https://wilburone.github.io/cosmos.

PDF Abstract

Analyzing Cosmos QA for Contextual Commonsense Machine Reading Comprehension

The paper introduces Cosmos QA, a dataset designed explicitly for machine reading comprehension that requires contextual commonsense reasoning. The dataset consists of 35,600 problems, posed as multiple-choice questions, where the comprehension demands go beyond mere textual understanding to include deductions about implicit information concerning everyday narratives.

Key Contributions

The authors highlight several distinctions of Cosmos QA compared to existing datasets. Primarily, it focuses on questions that require understanding beyond the text's explicit details and integrating commonsense knowledge to infer plausible reasons, effects, or counterfactual scenarios. This makes Cosmos QA unique in emphasizing "reading between the lines," which is only superficially addressed by other datasets like SQuAD or RACE.

The dataset is derived from personal narratives, as blog posts, which ensures diverse context scenarios requiring commonsense reasoning. Interestingly, a substantial 93.8% of the questions necessitate such reasoning, a characteristic starkly different from other datasets where commonsense reasoning is often a minority component.

Experimental Setup and Baseline Results

The authors employ state-of-the-art neural architectures like BERT, enhanced with multiway attention mechanisms, to establish baseline performances. This model variation shows improvement over straightforward applications of BERT, increasing accuracy to 68.4%. However, there's a pronounced gap when compared to human performance, which stands at 94%, highlighting areas ripe for further research in artificial intelligence, particularly in developing models capable of nuanced understanding mimicking human commonsense reasoning.

Beyond providing baselines, the authors conduct ablation studies to assess the role of different components: paragraphs, questions, and candidate answers, in deriving correct answers. The results from these studies underscore the spectrum of challenges presented by the dataset, particularly the necessity of maintaining interactions between the paragraph context and the associated questions and answers.

Implications and Future Directions

By designing a dataset that requires commonsense inferences, this research foregrounds areas where natural language understanding technologies fall short. The significant disparity between machine and human performances indicates future directions in AI, necessitating advances in model architectures capable of higher-level reasoning integrating explicit textual content with implicit commonsense knowledge.

The dataset also offers versatility in evaluation approaches by supporting both multiple-choice and generative models. This extends applicability in testing varying machine comprehension strategies. Knowledge transfer experiments detailed in the paper exhibit the potential of pre-trained models when fine-tuned on commonsense-rich contexts, suggesting an evolving trajectory of leveraging cross-dataset synergies for enhanced model capabilities.

Concluding Observations

Cosmos QA represents a significant step towards datasets that better emulate human-like reading comprehension challenges. Its contributions lie not only in providing a robust baseline for contextual commonsense reasoning but also in opening avenues for further explorations in model development and understanding of implicit content. As AI continues to advance, datasets like Cosmos QA will be critical in pushing the frontiers of machine understanding to new heights.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Lifu Huang (91 papers)
Ronan Le Bras (56 papers)
Chandra Bhagavatula (46 papers)
Yejin Choi (287 papers)

Citations (427)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

Cosmos QA