WASSA-2017 Shared Task on Emotion Intensity (1708.03700v1)

Published 11 Aug 2017 in cs.CL

Abstract: We present the first shared task on detecting the intensity of emotion felt by the speaker of a tweet. We create the first datasets of tweets annotated for anger, fear, joy, and sadness intensities using a technique called best--worst scaling (BWS). We show that the annotations lead to reliable fine-grained intensity scores (rankings of tweets by intensity). The data was partitioned into training, development, and test sets for the competition. Twenty-two teams participated in the shared task, with the best system obtaining a Pearson correlation of 0.747 with the gold intensity scores. We summarize the machine learning setups, resources, and tools used by the participating teams, with a focus on the techniques and resources that are particularly useful for the task. The emotion intensity dataset and the shared task are helping improve our understanding of how we convey more or less intense emotions through language.

Authors (2)

Saif M. Mohammad (70 papers)
Felipe Bravo-Marquez (8 papers)

Citations (246)

View on Semantic Scholar

Summary

The paper introduces a shared task that leverages a novel BWS-annotated dataset to predict emotion intensity in tweets.
It demonstrates the successful application of deep learning architectures and ensemble methods, achieving high correlation with human annotations.
The task sets a benchmark for emotion detection, driving further research in automated sentiment analysis and social media analytics.

Overview of the WASSA-2017 Shared Task on Emotion Intensity

The paper "WASSA-2017 Shared Task on Emotion Intensity" addresses the development and evaluation of machine learning models to determine the intensity of specific emotions expressed in tweets. This shared task was a significant effort to formalize and provide standardized datasets for the prediction of emotion intensity, focusing on four emotions: anger, fear, joy, and sadness. The methodology involved creating a novel dataset annotated for emotion intensity using the Best-Worst Scaling (BWS) technique, which provides reliable fine-grained intensity scores by ranking the tweets according to the intensity of emotions they convey. The shared task, hosted on CodaLab, received participation from 22 teams, showcasing varied and complex approaches to emotion intensity prediction, with top models achieving substantial correlation scores with the gold-standard annotations.

Dataset and Methodology

The dataset for the shared task was specifically curated for detecting intensity levels of emotions in tweets. The authors applied Best-Worst Scaling (BWS), a technique to annotate data more effectively for gradation rather than binary classification, which aims to address consistency issues often found in rating scales. The dataset comprised training, development, and test partitions with instances revealing a wide range of emotion intensities. Human annotators performed BWS annotations, converting them into comprehensive datasets with real-valued intensity scores between 0 and 1. This dataset was the first of its kind specifically developed for tweets, providing a valuable resource for deploying and training predictive models for emotion intensity.

Task Outcomes and Participating Systems

Participating systems were primarily evaluated based on their ability to correlate predicted emotion intensity with human annotations. The top-performing system, named Prayas, achieved a Pearson correlation of 0.747. Notably, the top systems combined various techniques, including word embeddings, neural networks (LSTM, CNN), and affective lexicons, optimizing their models by leveraging word and sentence embeddings coupled with ensemble learning methods. The successful deployment of dense distributed representations, particularly through ensemble models and neural networks, indicates a significant trend towards using comprehensive deep learning frameworks in emotion detection tasks.

Technical Highlights and Innovations

The shared task encouraged the use of several innovative approaches, with noticeable trends in the deployment of low-dimensional representations like word embeddings, sentence embeddings, and the integration of multiple affective lexicons. Noteworthy is the widespread application of deep learning techniques such as LSTMs and CNNs, often combined into deeper architectures for enhanced performance. Another critical aspect was the use of task-specific lexicons, which contributed significantly to the model performances, emphasizing the nuances of feature selection in emotion prediction tasks. Furthermore, the task facilitated understanding the intricacies in detecting not just discrete emotions but their varying intensities, introducing a dimensional perspective complementary to the traditionally categorical models of emotion detection.

Implications and Future Directions

The implications of this work extend towards improving automated systems capable of nuanced emotion detection, which can be practically applied in areas such as sentiment analysis, public opinion tracking, and wellbeing assessment through social media. The alignment of emotion detection with user sentiment can influence targeted communication strategies, automated moderation, and sentiment-based content delivery mechanisms.

The paper also sets a groundwork for future exploration into emotion intensity detection across languages and platforms by expanding the dataset to contain annotations for a broader spectrum of emotions. This effort fuels ongoing research and development in emotion analysis by providing a benchmark dataset against which future systems can be evaluated and iterated upon to enhance accuracy and applicability in varied contexts.

Through consistent competition in shared tasks such as WASSA-2017, and with freely accessible datasets, researchers can continue evolving the methodologies and architectures in emotion detection, fostering a community-driven advancement of the field. This collaborative and iterative approach ensures progressive fine-tuning of techniques to meet the complexities and challenges of discerning human emotions in digital communication.

PDF Markdown