Expression, Affect, Action Unit Recognition: Aff-Wild2, Multi-Task Learning and ArcFace (1910.04855v1)

Published 25 Sep 2019 in cs.CV, cs.HC, cs.LG, and eess.IV

Abstract: Affective computing has been largely limited in terms of available data resources. The need to collect and annotate diverse in-the-wild datasets has become apparent with the rise of deep learning models, as the default approach to address any computer vision task. Some in-the-wild databases have been recently proposed. However: i) their size is small, ii) they are not audiovisual, iii) only a small part is manually annotated, iv) they contain a small number of subjects, or v) they are not annotated for all main behavior tasks (valence-arousal estimation, action unit detection and basic expression classification). To address these, we substantially extend the largest available in-the-wild database (Aff-Wild) to study continuous emotions such as valence and arousal. Furthermore, we annotate parts of the database with basic expressions and action units. As a consequence, for the first time, this allows the joint study of all three types of behavior states. We call this database Aff-Wild2. We conduct extensive experiments with CNN and CNN-RNN architectures that use visual and audio modalities; these networks are trained on Aff-Wild2 and their performance is then evaluated on 10 publicly available emotion databases. We show that the networks achieve state-of-the-art performance for the emotion recognition tasks. Additionally, we adapt the ArcFace loss function in the emotion recognition context and use it for training two new networks on Aff-Wild2 and then re-train them in a variety of diverse expression recognition databases. The networks are shown to improve the existing state-of-the-art. The database, emotion recognition models and source code are available at http://ibug.doc.ic.ac.uk/resources/aff-wild2.

Citations (323)

View on Semantic Scholar

Summary

The paper introduces the Aff-Wild2 dataset, which integrates valence-arousal estimation, expression classification, and action unit detection using multi-task learning.
It employs CNN and CNN-RNN architectures combined with the ArcFace loss to enhance discriminative training and improve intra-class compactness.
The results show state-of-the-art performance across benchmarks, enabling more effective and robust affective computing applications.

A Detailed Examination of Aff-Wild2: Comprehensive Behavioral State Recognition in Diverse Environments

In the landscape of affective computing, the availability and diversity of trained datasets heavily influence models' capabilities in correctly identifying and interpreting human emotions. The paper "Expression, Affect, Action Unit Recognition: Aff-Wild2, Multi-Task Learning and ArcFace" by Dimitrios Kollias and Stefanos Zafeiriou tackles the challenges associated with creating a comprehensive dataset capable of integrating valence-arousal estimation, action unit detection, and basic expression classification. The authors introduce Aff-Wild2—a large-scale audiovisual dataset fabricated to encompass these requirements, ensuring a more holistic approach to emotion recognition in natural settings.

Novel Implementation in Affective Computing

The paper critically addresses existing limitations in affective computing datasets such as insufficient size, narrow scope in annotations, and constrained subject diversity. The creation of Aff-Wild2 represents a pivotal advancement with its extensive annotation of continuous emotional states and incorporation of action units and basic expressions, thus facilitating the combined paper of diverse affective behaviors.

Methodological Advancements

The authors present a multi-faceted experimental approach using CNN and CNN-RNN architectures aimed at processing both visual and auditory input modalities. These impressive efforts produced state-of-the-art results in emotion recognition tasks across various public databases, substantiating the efficacy and generalizability of Aff-Wild2-trained models. Additionally, the novel adaptation of the ArcFace loss function—traditionally employed in face recognition—demonstrates enhanced learning capacities for emotion recognition tasks, opening new pathways for integrating discriminative margin-based optimization techniques in this field.

Empirical Contributions

The findings indicate that applications of multi-task and multi-modal learning enrich the discriminative ability of neural networks. Benchmark evaluations show superior performance in Aff-Wild2-trained models over many known databases when evaluated in cross-database settings. Moreover, the application of the ArcFace loss function underscores its potential utility in emotion recognition, bolstering intra-class compactness while maximizing inter-class variations.

Practical and Theoretical Implications

From a practical standpoint, the introduction of Aff-Wild2 marks a significant resource for improving the real-world applicability of affective computing systems in fields such as human-computer interaction, market research, and clinical diagnostics. Theoretically, this paper advances discussions on integrating multi-task learning and specific loss functions in affective computing frameworks, potentially fueling future research in emotion-driven model designs and applications.

Future Directions

There remains important groundwork for further exploration, including refining pretrained architectures and extending the dataset's annotation scope. Given the advancement in contextual emotion analysis as demonstrated herein, future research might focus on real-time application of these models in dynamic environments, as well as investigating the interplay between complex auditory and visual signals in emotion elicitation and recognition.

This paper provides meticulously detailed procedures and salient results, contributing significantly to ongoing developments in affective computing. As academic communities continue to explore robust methods to capture and interpret human emotions, resources such as Aff-Wild2 and innovative methodologies detailed in this research will serve as pivotal operational benchmarks.

PDF Markdown