EmoBank: Studying the Impact of Annotation Perspective and Representation Format on Dimensional Emotion Analysis (2205.01996v1)

Published 4 May 2022 in cs.CL, cs.AI, and cs.LG

Abstract: We describe EmoBank, a corpus of 10k English sentences balancing multiple genres, which we annotated with dimensional emotion metadata in the Valence-Arousal-Dominance (VAD) representation format. EmoBank excels with a bi-perspectival and bi-representational design. On the one hand, we distinguish between writer's and reader's emotions, on the other hand, a subset of the corpus complements dimensional VAD annotations with categorical ones based on Basic Emotions. We find evidence for the supremacy of the reader's perspective in terms of IAA and rating intensity, and achieve close-to-human performance when mapping between dimensional and categorical formats.

Citations (205)

View on Semantic Scholar

Summary

The paper presents a large corpus of 10,062 English sentences annotated with dimensional emotions from both writer and reader perspectives.
It employs the Valence-Arousal-Dominance model alongside a simplified 5-point scale to enhance annotation reliability and reduce cognitive load.
The study demonstrates that machine learning can effectively map between categorical and dimensional representations, achieving performance near human agreement.

EmoBank: Annotating Affective Dimensions in Text

The paper "EmoBank: Studying the Impact of Annotation Perspective and Representation Format on Dimensional Emotion Analysis," authored by Sven Buechel and Udo Hahn, presents a comprehensive corpus of 10,062 English sentences annotated with emotion metadata. This work identifies two critical dimensions in enhancing emotion analysis: the annotation perspective (writer vs. reader) and the representation format (dimensional vs. categorical).

Construction and Annotation

The EmoBank corpus distinguishes itself by leveraging the Valence-Arousal-Dominance (VAD) model as its dimensional emotion framework. This model, comprising three dimensions—Valence (V), Arousal (A), and Dominance (D)—forms a real-valued vector space depiction of emotions. The choice to annotate from both writer and reader perspectives addresses previous gaps in sentiment analysis, where most corpora focus solely on one perspective. EmoBank's genre-balanced corpus ensures diversity and relevance across various text categories, extending the utility of the dataset beyond specialized domains like social media or reviews.

The annotation process employed a crowdsourcing methodology with CrowdFlower, opting for 5-point scales rather than the traditional 9-point Self-Assessment Manikin (SAM) scales to reduce cognitive load on annotators. This strategic decision aimed to improve annotation feasibility and data reliability.

Analysis of Inter-Annotator Agreement (IAA)

The authors provide an in-depth analysis of inter-annotator agreement leveraging metrics appropriate for real-valued data, namely Pearson's correlation coefficient and Mean Absolute Error (MAE). The reader perspective annotations yielded slightly superior correlation-based IAA metrics (e.g., average $r$ values of approximately 0.634 for reader emotions vs. 0.605 for writer emotions), indicating higher consistency in perceived emotional response. Conversely, the writer perspective showed a marginally better error-based IAA, a point clarified through linear regression analysis between error and emotionality. The authors suggest that despite the variations in error metrics, the reader perspective offers a balanced trade-off between correlation and emotional intensity in annotations.

Mapping Between Emotion Formats

EmoBank's unique feature of bi-representational annotation, which includes a subset annotated with both the VAD representation and Ekman's Basic Emotions, enables rigorous examination of mapping between categorical and dimensional formats. The findings suggest that machine learning approaches, such as k-Nearest Neighbors (KNN), can map between these representations with performance approaching human annotation levels. Specifically, the paper reports that models using both writer's and reader's VAD scores surpassed human inter-annotator agreement for certain emotion categories, asserting the feasibility and potential of automatic emotion format conversion.

Implications and Future Directions

This research highlights significant theoretical and practical implications. The ability to effectively distinguish between writer and reader emotions expands the scope of affective computing applications, from personalized content delivery to enhanced sentiment analysis in nuanced contexts. The mapping capabilities between emotional representation formats foster interoperability, supporting the integration of different emotion datasets and models in computational analyses.

Moreover, the EmoBank corpus sets a precedent for large-scale emotion datasets, encouraging further exploration into how these emotional dimensions interact with other linguistic and cognitive factors. Future advancements may focus on refining annotation techniques, developing more granular emotion representation models, and exploring cross-linguistic emotion datasets.

In conclusion, EmoBank constitutes a valuable asset to the computational linguistics community, offering insights into the intricate interplay of human emotion and language. The corpus's dual annotations and multi-format compatibility make it a pivotal resource for advancing the precision and applicability of emotion analysis methodologies.

PDF Markdown