Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Context Based Emotion Recognition using EMOTIC Dataset (2003.13401v1)

Published 30 Mar 2020 in cs.CV and cs.LG

Abstract: In our everyday lives and social interactions we often try to perceive the emotional states of people. There has been a lot of research in providing machines with a similar capacity of recognizing emotions. From a computer vision perspective, most of the previous efforts have been focusing in analyzing the facial expressions and, in some cases, also the body pose. Some of these methods work remarkably well in specific settings. However, their performance is limited in natural, unconstrained environments. Psychological studies show that the scene context, in addition to facial expression and body pose, provides important information to our perception of people's emotions. However, the processing of the context for automatic emotion recognition has not been explored in depth, partly due to the lack of proper data. In this paper we present EMOTIC, a dataset of images of people in a diverse set of natural situations, annotated with their apparent emotion. The EMOTIC dataset combines two different types of emotion representation: (1) a set of 26 discrete categories, and (2) the continuous dimensions Valence, Arousal, and Dominance. We also present a detailed statistical and algorithmic analysis of the dataset along with annotators' agreement analysis. Using the EMOTIC dataset we train different CNN models for emotion recognition, combining the information of the bounding box containing the person with the contextual information extracted from the scene. Our results show how scene context provides important information to automatically recognize emotional states and motivate further research in this direction. Dataset and code is open-sourced and available at: https://github.com/rkosti/emotic and link for the peer-reviewed published article: https://ieeexplore.ieee.org/document/8713881

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Ronak Kosti (10 papers)
  2. Jose M. Alvarez (90 papers)
  3. Agata Lapedriza (26 papers)
  4. Adria Recasens (9 papers)
Citations (165)

Summary

Context-Based Emotion Recognition using EMOTIC Dataset

The paper presents a detailed exploration into emotion recognition in images, emphasizing the importance of context in understanding emotional states in natural, unconstrained environments. Traditional computer vision approaches primarily focus on facial expressions and body posture, whose efficacy is often limited when applied to real-world contexts. Psychological studies underline the importance of scene context—comprised of surroundings, objects, and ongoing actions—in perception, yet its computational leverage is underexplored due to data scarcity. Against this backdrop, the authors introduce the EMOTIC dataset, a robust collection of images with people in varied natural settings annotated with perceived emotions. This dataset bridges the gap by including annotations using both discrete emotional categories and continuous emotion dimensions.

The EMOTIC dataset comprises 23,571 images with annotations for 34,320 individuals. It provides two representations of emotions: a categorical representation with 26 emotion categories and a dimensional representation encompassing Valence, Arousal, and Dominance. The dataset offers comprehensive coverage of diverse emotional expressions and environmental contexts, making it a valuable resource for the advancement of emotion recognition systems. The annotation process, primarily conducted on Amazon Mechanical Turk, ensures that each image is labeled according to perceived emotions, considering both discrete categories and continuous dimensions.

Alongside the dataset, the authors propose a baseline Convolutional Neural Network (CNN) architecture for recognizing emotions by processing both individual and contextual information. The model comprises separate modules for extracting features from the person and the scene context, later fused to predict emotional states via discrete categories and continuous dimensions. The results illustrate a clear advantage in leveraging context, as models using contextual information consistently outperform those considering only the individual.

Empirical evaluation uses metrics such as Average Precision for discrete categories and Average Absolute Error for dimensions. The analysis reaffirms the value of context and sentiment-based features, with the model significantly benefiting from information beyond facial and bodily cues. This improvement highlights the promising potential for context-aware systems in emotion recognition applications, indicating pathways for future enhancements in AI-driven affective computing.

The implications of this research are far-reaching in fields requiring human-computer interaction or surveillance systems where understanding human emotions can lead to more intuitive and responsive technologies. As such, the release of the EMOTIC dataset and the proposed model not only provide immediate tools for the research community but also lay the groundwork for future innovations in AI-based emotion recognition systems. This advancement empowers further development of applications in areas such as education technology, vehicular safety systems, personal assistants, and psychiatric diagnostics, urging continued exploration into embedding contextual nuances in computational frameworks.

Overall, the paper contributes to a nuanced understanding of emotion perception, advocating for the inclusion of scene context within computational emotion recognition systems—a perspective that represents a significant paradigm shift toward more holistic interpretative models in affective computing.