Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond (1804.10938v5)

Published 29 Apr 2018 in cs.CV, cs.AI, cs.HC, eess.IV, and stat.ML

Abstract: Automatic understanding of human affect using visual signals is of great importance in everyday human-machine interactions. Appraising human emotional states, behaviors and reactions displayed in real-world settings, can be accomplished using latent continuous dimensions (e.g., the circumplex model of affect). Valence (i.e., how positive or negative is an emotion) & arousal (i.e., power of the activation of the emotion) constitute popular and effective affect representations. Nevertheless, the majority of collected datasets this far, although containing naturalistic emotional states, have been captured in highly controlled recording conditions. In this paper, we introduce the Aff-Wild benchmark for training and evaluating affect recognition algorithms. We also report on the results of the First Affect-in-the-wild Challenge that was organized in conjunction with CVPR 2017 on the Aff-Wild database and was the first ever challenge on the estimation of valence and arousal in-the-wild. Furthermore, we design and extensively train an end-to-end deep neural architecture which performs prediction of continuous emotion dimensions based on visual cues. The proposed deep learning architecture, AffWildNet, includes convolutional & recurrent neural network layers, exploiting the invariant properties of convolutional features, while also modeling temporal dynamics that arise in human behavior via the recurrent layers. The AffWildNet produced state-of-the-art results on the Aff-Wild Challenge. We then exploit the AffWild database for learning features, which can be used as priors for achieving best performances both for dimensional, as well as categorical emotion recognition, using the RECOLA, AFEW-VA and EmotiW datasets, compared to all other methods designed for the same goal. The database and emotion recognition models are available at http://ibug.doc.ic.ac.uk/resources/first-affect-wild-challenge.

Authors (8)

Dimitrios Kollias (48 papers)
Panagiotis Tzirakis (24 papers)
Mihalis A. Nicolaou (17 papers)
Athanasios Papaioannou (22 papers)
Guoying Zhao (103 papers)
Björn Schuller (83 papers)
Irene Kotsia (13 papers)
Stefanos Zafeiriou (137 papers)

Citations (418)

View on Semantic Scholar

Summary

Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond

The paper "Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond" advances the field of affective computing by introducing the Aff-Wild database, a novel dataset for affective computing captured in-the-wild, and presents the Aff-Wild Challenge. The research also proposes the AffWildNet architecture, a deep learning model designed to predict continuous valence and arousal values in visual signals in unconstrained environments. The methodology demonstrates notable advancements in the automatic analysis of human emotions through visual cues using CNN-RNN networks and promotes the development of robust emotion recognition systems.

Introduction and Dataset Contribution

The core contribution of this work is the comprehensive Aff-Wild database. Unlike existing datasets captured under controlled recording conditions, Aff-Wild is collected from real-world settings, offering a challenging dataset that includes diverse emotional states, head poses, ethnicities, illumination variations, and occlusions. This database comprises 298 videos of 200 subjects with ground-truth annotations for valence and arousal, offering a valuable resource for training deep neural networks specialized in emotion recognition. The significance of this dataset not only lies in its coverage but also in the rigorous post-processing and validation procedures implemented to ensure annotation reliability.

Model Architecture and Training

AffWildNet, proposed in the paper, represents a sophisticated end-to-end architecture that integrates CNN and RNN components. The architecture is tailored for the affect prediction task by utilizing the representational power of convolutional networks (ResNet-50, VGG-Face) for feature extraction and leveraging recurrent layers (GRU units) for temporal context modeling. A key design decision was to incorporate facial landmarks into the fully connected layers, enhancing the network’s ability to consider spatially invariant features while capturing the complex dynamics of emotional expressions over time. The loss function used during training is based on the Concordance Correlation Coefficient (CCC), aligning closely with the evaluation metrics, thereby improving the training efficacy by directly optimizing the primary evaluation criterion.

Experimental Evaluation

A rigorous experimental evaluation is presented, in which AffWildNet is thoroughly compared against baseline and competitive architectures submitted for the Aff-Wild Challenge. The model not only demonstrated superior performance in terms of CCC and MSE compared to other approaches but also provided insights into optimal network configurations. Furthermore, the database's utility as a training prior was validated through experiments on secondary datasets like RECOLA and AFEW-VA, achieving state-of-the-art performance across diverse emotional recognition tasks. This highlights the versatility and generalization capabilities of AffWildNet across different affective computing scenarios.

Implications and Future Directions

This paper sets a new standard for affective computing research by providing a comprehensive publicly available dataset and robust model architecture that can be adapted for both dimensional and categorical emotion recognition. AffWildNet's strong results suggest that integrated CNN-RNN frameworks can learn nuanced emotional representations, addressing real-world challenges encountered in uncontrolled environments. The success of feature transfer from AffWildNet to other emotion recognition tasks indicates a promising direction for multi-task learning approaches in facial emotion recognition. Future research could enhance these developments by incorporating multi-modal data sources, exploring transformer-based architectures, or adopting few-shot learning techniques to further enrich the cognitive capabilities of emotion-aware systems.

In conclusion, this paper contributes significantly to the advancement of emotion recognition systems, providing the research community with effective tools and methodologies for understanding complex affective phenomena in real-world scenarios. The AffWildNet architecture, validated through comprehensive evaluations, establishes a strong foundation for future innovations and applications in emotion-aware human-computer interactions.

PDF Markdown

Related Papers

Find Related Papers