DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset (1710.03957v1)

Published 11 Oct 2017 in cs.CL

Abstract: We develop a high-quality multi-turn dialog dataset, DailyDialog, which is intriguing in several aspects. The language is human-written and less noisy. The dialogues in the dataset reflect our daily communication way and cover various topics about our daily life. We also manually label the developed dataset with communication intention and emotion information. Then, we evaluate existing approaches on DailyDialog dataset and hope it benefit the research field of dialog systems.

Citations (1,211)

View on Semantic Scholar

Summary

The paper presents a manually labeled multi-turn dialogue dataset capturing everyday conversational patterns with detailed annotations on emotions and dialogue acts.
It utilizes manual labeling with a 78.9% inter-annotator agreement to ensure reliable identification of seven emotions and four dialogue acts.
Evaluations reveal that incorporating intention and emotion labels significantly enhances the performance of both retrieval and generation-based dialogue systems.

Overview of "DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset"

The paper presents "DailyDialog," a multi-turn dialogue dataset meticulously labeled for communication intentions and emotions. The authors Yanran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, and Shuzi Niu from The Hong Kong Polytechnic University, Chinese Academy of Sciences, and Saarland University, contribute to the dialogue systems community by addressing the inadequacies present in existing dialogue datasets. DailyDialog is designed to reflect everyday conversational patterns and encompasses diverse topics, yielding a resource that is both comprehensive and contextually relevant to real-world applications.

Key Features of the DailyDialog Dataset

The dataset is notable for the following attributes:

Human-written and Formal Language: Unlike social media-based datasets (e.g., Twitter Corpus, Weibo dataset), DailyDialog contains less noisy and more formal conversational data.
Rich in Emotion and Intentions: Conversations are labeled for seven different emotions and four types of dialogue acts, enabling deeper exploration of conversational dynamics.
Focused Yet Varied Topics: Conversations cover ten categories ranging from ordinary life to financial topics, providing a wide range of scenarios for training robust dialogue systems.
Balanced Multi-turn Dialogues: With an average of 8 turns per dialogue, the dataset maintains coherency and focus, unlike datasets with excessively long and dispersed conversations.

Dataset Construction

The creation of DailyDialog involved several critical steps:

Crawling and Filtering: Conversation data were crawled from websites designed for English learners, ensuring that the dialogues are both realistic and contextually significant.
Manual Labeling: Dialogues were annotated manually for emotions and dialogue acts, achieving an inter-annotator agreement of 78.9%. This careful labeling process ensures the reliability and utility of the dataset for advanced research.
Statistical Adjustments: Measures were taken to filter out dialogues involving more than two speakers and to autocorrect misspellings, ensuring the data's quality and consistency.

Evaluation and Results

The paper evaluates multiple existing approaches utilizing the DailyDialog dataset, encompassing retrieval-based and generation-based dialogue systems.

Retrieval-based Approaches

Feature-based and Embedding-based Methods: The feature-based method that integrates TF-IDF and fuzzy string matching outperformed the embedding-based approach in BLEU metrics.
Enhanced Reranking Mechanisms: Retrieval models incorporating intention and emotion-based reranking demonstrated improved performance in retrieving contextually appropriate responses. The case studies highlight how embedding history of intentions and emotions leads to more accurate and contextually relevant responses.

Generation-based Approaches

Seq2Seq and Attention Mechanisms: Attention-based Seq2Seq models, particularly those leveraging hierarchical encoder-decoder architectures, achieved better BLEU scores compared to vanilla Seq2Seq models.
Label Incorporation: Models augmented with intention and emotion labels showed enhanced performance, underscoring the value of labeled additional contextual information in dialogue generation.
Pre-training with Different Domains: While pretraining on the OpenSubtitle dataset yielded lower perplexity, it adversely affected BLEU scores due to domain discrepancies, indicating that domain-aligned pretraining is crucial for optimal performance.

Implications and Future Directions

DailyDialog facilitates significant advancements in dialogue systems research by providing a nuanced and meticulously labeled dataset. The emotion-rich and intention-labeled dialogs offer opportunities for developing empathetic and context-aware conversational agents. The clear delineation of varied conversational topics fosters the creation of models adept at handling a diverse range of user interactions.

Future research could explore advanced techniques to leverage the multi-turn dialog flow patterns, potentially integrating transfer learning and domain adaptation methodologies to enhance the utility of conversational models. Furthermore, the dataset's structure paves the way for developing more sophisticated dialogue management systems that can maintain conversational coherence and generate contextually adaptive responses.

DailyDialog serves as a critical resource for advancing both theoretical research and practical applications in the field of dialogue systems, setting a benchmark for future datasets in this domain.

PDF Markdown