EmotionLines: An Emotion Corpus of Multi-Party Conversations

Published 23 Feb 2018 in cs.CL | (1802.08379v2)

Abstract: Feeling emotion is a critical characteristic to distinguish people from machines. Among all the multi-modal resources for emotion detection, textual datasets are those containing the least additional information in addition to semantics, and hence are adopted widely for testing the developed systems. However, most of the textual emotional datasets consist of emotion labels of only individual words, sentences or documents, which makes it challenging to discuss the contextual flow of emotions. In this paper, we introduce EmotionLines, the first dataset with emotions labeling on all utterances in each dialogue only based on their textual content. Dialogues in EmotionLines are collected from Friends TV scripts and private Facebook messenger dialogues. Then one of seven emotions, six Ekman's basic emotions plus the neutral emotion, is labeled on each utterance by 5 Amazon MTurkers. A total of 29,245 utterances from 2,000 dialogues are labeled in EmotionLines. We also provide several strong baselines for emotion detection models on EmotionLines in this paper.

Abstract PDF Upgrade to Chat

Citations (234)

View on Semantic Scholar

Summary

The paper presents a novel dialogue dataset labeled with seven distinct emotions, filling the gap in context-aware emotion detection for multi-party conversations.
It employs a rigorous annotation process using Amazon Mechanical Turk to label 29,245 utterances from TV scripts and chat logs, ensuring high reliability.
Baseline experiments with CNN and CNN-BiLSTM models achieve weighted accuracies of 63.9% and 77.4%, underscoring the impact of contextual cues in emotion classification.

Insightful Overview of "EmotionLines: An Emotion Corpus of Multi-Party Conversations"

In the field of emotion detection in natural language processing, the "EmotionLines" paper emerges as a significant contribution. The authors present a novel dataset tailored for emotion recognition within the contextual framework of dialogues, setting it apart from prior datasets that fail to capture such nuanced emotional flows.

Background and Motivation

Traditional emotion detection resources predominantly focus on individual words, sentences, or stand-alone documents, often neglecting the context necessary for interpreting the flow of emotions within dialogues. This limitation hinders the development of conversational AI systems that require an understanding of emotion dynamics to generate human-like responses. Recognizing this gap, the paper introduces EmotionLines, a corpus that labels emotions at the utterance level across dialogues, leveraging sources like Friends TV scripts and real-time Facebook Messenger consultations.

Dataset Construction and Methodology

EmotionLines consists of 2,000 dialogues and approximately 29,245 utterances, meticulously labeled by human annotators with one of seven emotions as per Paul Ekman's framework—anger, disgust, fear, happiness, sadness, surprise, and neutral. The implementation of Amazon Mechanical Turk for labeling ensures that each utterance is evaluated by multiple annotators to bolster label reliability. By focusing on textual dialogues exclusively derived from TV scripts and private chat logs, the dataset promotes a detailed exploration of emotion representation grounded solely on textual content, bypassing multimodal aspects such as visual or acoustic cues.

Technical Experiments and Baselines

The authors explore the capability of convolutional neural networks (CNNs) and a blend of CNNs with bidirectional LSTMs (CNN-BiLSTMs) to serve as baselines for emotion classification. Experimentation yielded a weighted accuracy of 63.9% and 77.4% on the Friends and EmotionPush subsets, respectively, demonstrably superior to single utterance models. These findings validate the advantage of using context-aware architectures for gauging emotions. The paper also addresses the effects of imbalanced emotion distribution in the dataset, outlining future efforts to refine category representation.

Implications and Future Directions

EmotionLines holds potential to influence dialogue system advancements. By integrating emotion detection grounded in context, AI-driven systems can be engineered to deliver more sophisticated, empathetic interactions. As conversational agents become more deeply embedded in societal frameworks, their emotional intelligence parallels their conversational capability, underscoring the need for emotion-aware dialogue datasets. Future research may proceed to fine-tune models to capitalize on EmotionLines, further enriching emotional variety by incorporating additional emotive content across domains, such as theatrical scripts or dramatic narratives.

The availability of this dataset opens avenues for continued exploration in conversational AI, not only enhancing task-oriented dialogue systems but also contributing to the progression of chit-chat systems towards authenticity and emotional awareness. Overall, EmotionLines offers a pivotal resource poised to drive innovation in the emotive capabilities of AI-empowered dialogue systems.

Markdown