Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DREAM: A Challenge Dataset and Models for Dialogue-Based Reading Comprehension (1902.00164v1)

Published 1 Feb 2019 in cs.CL
DREAM: A Challenge Dataset and Models for Dialogue-Based Reading Comprehension

Abstract: We present DREAM, the first dialogue-based multiple-choice reading comprehension dataset. Collected from English-as-a-foreign-language examinations designed by human experts to evaluate the comprehension level of Chinese learners of English, our dataset contains 10,197 multiple-choice questions for 6,444 dialogues. In contrast to existing reading comprehension datasets, DREAM is the first to focus on in-depth multi-turn multi-party dialogue understanding. DREAM is likely to present significant challenges for existing reading comprehension systems: 84% of answers are non-extractive, 85% of questions require reasoning beyond a single sentence, and 34% of questions also involve commonsense knowledge. We apply several popular neural reading comprehension models that primarily exploit surface information within the text and find them to, at best, just barely outperform a rule-based approach. We next investigate the effects of incorporating dialogue structure and different kinds of general world knowledge into both rule-based and (neural and non-neural) machine learning-based reading comprehension models. Experimental results on the DREAM dataset show the effectiveness of dialogue structure and general world knowledge. DREAM will be available at https://dataset.org/dream/.

Review of DREAM: A Challenge Dataset for Dialogue-Based Reading Comprehension

The paper "DREAM: A Challenge Dataset and Models for Dialogue-Based Reading Comprehension" introduces the DREAM dataset—a novel corpus constructed to facilitate research in dialogue-based reading comprehension within natural language processing. Developed using English-as-a-foreign-language exams, the dataset contains 10,197 multiple-choice questions across 6,444 dialogues, aiming to probe the intricacies of multi-turn, multi-party dialogues and their comprehension by machines.

Dataset Characteristics

DREAM distinguishes itself from other reading comprehension datasets by focusing exclusively on dialogue contexts rather than static written texts. Notably, 84% of the answers in DREAM are non-extractive, indicating that they cannot be resolved by merely locating a span of text within the dialogue. Furthermore, 85% of the questions necessitate reasoning beyond a single sentence, and 34% require the integration of commonsense knowledge—a marked departure from challenges presented by datasets derived from formal written text, such as news articles or Wikipedia passages.

Methodological Approach

The paper evaluates several baseline and state-of-the-art models to assess their effectiveness on the DREAM dataset. Traditional models that primarily analyze surface information, such as the Stanford Attentive Reader and the Gated-Attention Reader, exhibited limited success on DREAM, closely trailing simple rule-based approaches. The best neural model tested, Co-Matching, achieved a modest 45.5% accuracy, underscoring the limitations of existing systems in processing dialogue-specific content where explicit cues may not be present.

In contrast, the authors explored enhancements through the utilization of general world knowledge and dialogue-specific features. By integrating world knowledge from ConceptNet embeddings and a speaker-focused structural dialogue understanding, the authors obtained a notable improvement in model performance. Their best model, FTLM++++, augmented by speaker embeddings and fine-tuning from a pre-trained LLM, recorded a considerable advancement, achieving an accuracy of 57.4%.

Implications and Future Directions

The challenges posed by the dataset showcase the need for advancing models capable of emulating human-like understanding in dialogue contexts, emphasizing reasoning and commonsense capabilities. The significant gap between machine performance and human ceiling performance (98.6%) on DREAM highlights several areas ripe for exploration—specifically, the necessity for models to capture and synthesize information dispersed over multiple dialogue turns and speakers.

The findings suggest potential research directions, including the incorporation of more refined dialogue structures, such as speaker role embeddings, and enhanced utilization of external commonsense knowledge bases. This approach could involve integrating narrative event chains or co-reference resolution systems customized for dialogues, potentially reducing the impact of misleading distractor answers.

In conclusion, the DREAM dataset provides a compelling platform for dialogue-based reading comprehension research, demanding advancements in machine learning methods to tackle its nuances. Future endeavors aimed at augmenting models with capabilities to process conversational nuances and implicit knowledge are poised to push the boundaries of natural language understanding further.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Kai Sun (317 papers)
  2. Dian Yu (78 papers)
  3. Jianshu Chen (66 papers)
  4. Dong Yu (328 papers)
  5. Yejin Choi (287 papers)
  6. Claire Cardie (74 papers)
Citations (286)