ABAW: Valence-Arousal Estimation, Expression Recognition, Action Unit Detection & Multi-Task Learning Challenges (2202.10659v2)

Published 22 Feb 2022 in cs.CV and cs.LG

Abstract: This paper describes the third Affective Behavior Analysis in-the-wild (ABAW) Competition, held in conjunction with IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2022. The 3rd ABAW Competition is a continuation of the Competitions held at ICCV 2021, IEEE FG 2020 and IEEE CVPR 2017 Conferences, and aims at automatically analyzing affect. This year the Competition encompasses four Challenges: i) uni-task Valence-Arousal Estimation, ii) uni-task Expression Classification, iii) uni-task Action Unit Detection, and iv) Multi-Task-Learning. All the Challenges are based on a common benchmark database, Aff-Wild2, which is a large scale in-the-wild database and the first one to be annotated in terms of valence-arousal, expressions and action units. In this paper, we present the four Challenges, with the utilized Competition corpora, we outline the evaluation metrics and present the baseline systems along with their obtained results.

Citations (173)

View on Semantic Scholar

Summary

The paper introduces a robust benchmark via the ABAW competition for valence-arousal estimation, expression recognition, and action unit detection using real-world data.
It employs uni-task and multi-task challenges with the Aff-Wild2 dataset and baseline models like ResNet50 and VGG16 to ensure reproducibility in affect analysis research.
Results underscore the promise of deep learning and multi-modal techniques to enhance emotion recognition, paving the way for future innovations in affective behavior analysis.

Insights into the ABAW Competition for Affective Behavior Analysis

The paper "ABAW: Valence-Arousal Estimation, Expression Recognition, Action Unit Detection Multi-Task Learning Challenges" primarily underscores the objectives and framework of the third Affective Behavior Analysis in-the-wild (ABAW) Competition, conducted in association with the IEEE CVPR 2022. It builds upon the precedence set by previous competitions held at ICCV 2021, IEEE FG 2020, and IEEE CVPR 2017. The focal intent of this competition is the automatic analysis of affective behaviors from facial expressions in unconstrained, real-world environments.

Framework and Challenges

The competition is delineated into four specific challenges aiming at different facets of affective behavior analysis:

Uni-task Valence-Arousal Estimation: This challenge focuses on estimating valence and arousal levels from facial expressions, utilizing continuous dimensional emotion representations.
Uni-task Expression Classification: Participants aim to classify expressions into six basic categories, a neutral state, and an 'other' category for non-standard affective states.
Uni-task Action Unit Detection: In this challenge, participants focus on detecting 12 defined facial action units (AUs).
Multi-Task Learning: This novel challenge entails joint learning of valence-arousal, expressions, and action units to encourage harmony and synergy in predicting affective states from facial data.

Aff-Wild2 Corpus

Central to these challenges is the Aff-Wild2 database, a comprehensive benchmark encompassing annotated data in terms of valence-arousal, expression categories, and action units. The database is uniquely annotated in-the-wild, providing realistic settings which facilitate developing machine understanding for unstructured real-world environments. It extends the capabilities of Aff-Wild, adding greater diversity and volume to the dataset, thereby supporting robust model training and evaluation.

Evaluation Metrics and Baselines

Each challenge is subjected to rigorous evaluation using specialized metrics:

The Concordance Correlation Coefficient (CCC) is leveraged for valence-arousal estimation, emphasizing the agreement between predicted and true time-series affect signals.
F1 Score is utilized as a metric for both expression recognition and action unit detection challenges, focusing on precision and recall across categories or units.

Baseline models were developed using architectures like ResNet50 and VGG16 networks, emphasizing reproducibility and leveraging existing machine learning toolkits. The models are pre-trained on large datasets such as ImageNet and VGGFACE, indicating the use of transfer learning to enhance performance in the competition challenges.

Results and Implications

The baseline models provide a foundation against which participants can benchmark their solutions. Results from these initial models indicate significant room for exploration and improvement, especially in multi-task learning approaches that promise integrated affect recognition capabilities.

Future Directions

The ABAW competition endorses the exploration of advanced deep learning techniques to enhance model accuracy in affective behavior prediction. Future directions may include:

Incorporating multimodal data streams to enrich model inputs beyond visual cues.
Exploring unsupervised and semi-supervised learning methodologies for improved generalization.
Utilizing attention-based architectures to dynamically focus on salient facial characteristics driving emotion states.

Overall, the ABAW competitions act as an incubator for innovative solutions providing crucial insights into facial affect analysis. They stimulate the development of systems capable of understanding and predicting human emotions in natural contexts, thereby advancing applications in interactive systems, health monitoring, and supportive technologies.

PDF Markdown