GoEmotions: A Dataset of Fine-Grained Emotions (2005.00547v2)

Published 1 May 2020 in cs.CL

Abstract: Understanding emotion expressed in language has a wide range of applications, from building empathetic chatbots to detecting harmful online behavior. Advancement in this area can be improved using large-scale datasets with a fine-grained typology, adaptable to multiple downstream tasks. We introduce GoEmotions, the largest manually annotated dataset of 58k English Reddit comments, labeled for 27 emotion categories or Neutral. We demonstrate the high quality of the annotations via Principal Preserved Component Analysis. We conduct transfer learning experiments with existing emotion benchmarks to show that our dataset generalizes well to other domains and different emotion taxonomies. Our BERT-based model achieves an average F1-score of .46 across our proposed taxonomy, leaving much room for improvement.

Citations (611)

View on Semantic Scholar

Summary

The paper introduces GoEmotions, a manually annotated dataset of 58K Reddit comments with 27 distinct emotion categories.
It validates its robust annotations using PCA and transfer learning experiments, with a BERT-based model achieving an average F1-score of 0.46.
The dataset’s fine-grained typology offers practical applications in empathetic chatbots, sentiment analysis, and content moderation.

Overview of "GoEmotions: A Dataset of Fine-Grained Emotions"

The paper "GoEmotions: A Dataset of Fine-Grained Emotions" presents a substantial advancement in the creation of computational tools for understanding emotion in textual data. The authors introduce GoEmotions, a comprehensive dataset comprising 58,000 English Reddit comments annotated with 27 distinct emotion categories or Neutral. This paper addresses the growing need for nuanced datasets in the field of NLP, aiding in tasks from developing empathetic chatbots to identifying harmful online interactions.

Key Contributions

The primary contribution of this research is the development of GoEmotions, a large-scale, manually annotated dataset. The dataset includes a fine-grained typology suitable for a wide array of downstream applications. The annotations' quality is validated through Principal Preserved Component Analysis, which underscores the dataset’s robustness and reliability.

Experiments and Findings

To evaluate the efficacy of GoEmotions, the authors perform transfer learning experiments using existing emotion benchmarks, underscoring the dataset's ability to generalize across domains and diverse emotion taxonomies. This ability to generalize is critical for ensuring that models trained on this dataset can be applied to varied textual sources beyond Reddit.

In terms of model performance, the authors employ a BERT-based model, which achieves an average F1-score of 0.46 across the proposed taxonomy. This result, while marking a step forward, also indicates considerable potential for improvement, highlighting the complexity of emotion detection tasks in NLP.

Implications and Future Directions

The introduction of GoEmotions has important implications for both theoretical research and practical applications. The dataset provides a solid foundation for developing more sophisticated AI models capable of deeper emotional understanding. Practically, it can enhance applications in sentiment analysis, content moderation, and social media monitoring.

Future research could focus on improving model performance scores by exploring advanced architectures or augmenting the dataset with additional context layers. As more diverse and intricate datasets like GoEmotions become available, the potential for accurately detecting and responding to human emotions in AI applications grows significantly.

In conclusion, this paper presents a valuable resource for the AI community, contributing to the ongoing effort to understand and interpret emotions in the digital field more effectively. The introduction of GoEmotions sets a new standard for granularity and quality in emotion-labeled datasets, offering a path forward for future research and development in emotion recognition technologies.