Analysing Affective Behavior in the second ABAW2 Competition (2106.15318v2)

Published 14 Jun 2021 in cs.CV

Abstract: The Affective Behavior Analysis in-the-wild (ABAW2) 2021 Competition is the second -- following the first very successful ABAW Competition held in conjunction with IEEE FG 2020- Competition that aims at automatically analyzing affect. ABAW2 is split into three Challenges, each one addressing one of the three main behavior tasks of valence-arousal estimation, basic expression classification and action unit detection. All three Challenges are based on a common benchmark database, Aff-Wild2, which is a large scale in-the-wild database and the first one to be annotated for all these three tasks. In this paper, we describe this Competition, to be held in conjunction with ICCV 2021. We present the three Challenges, with the utilized Competition corpora. We outline the evaluation metrics and present the baseline system with its results. More information regarding the Competition is provided in the Competition site: https://ibug.doc.ic.ac.uk/resources/iccv-2021-2nd-abaw.

Citations (195)

View on Semantic Scholar

Summary

The paper establishes a benchmark for affective behavior analysis by integrating valence-arousal estimation, expression classification, and action unit detection using the large-scale Aff-Wild2 dataset.
It demonstrates the effectiveness of a VGG-FACE based model with baseline results showing a CCC of 0.22 and promising weighted scores for expression classification.
The findings emphasize the challenges of real-world emotion recognition and pave the way for future multimodal affective computing research.

Analyzing Affective Behavior in the ABAW2 Competition

The second Affective Behavior Analysis in-the-wild (ABAW2) competition aims to advance the field of affective computing by setting a challenging benchmark for automatically analyzing affect in real-world settings. This manuscript chronicles the design and methodology of the ABAW2 competition held in conjunction with ICCV 2021, focusing on three core behavior tasks: Valence-Arousal (VA) Estimation, seven Basic Expression Classification, and twelve Action Unit (AU) Detection. Each task leverages the Aff-Wild2 database, a comprehensive repository of annotated video data captured in natural and unconstrained environments.

Competition Overview

The competition is structured around three distinct challenges:

Valence-Arousal Estimation: This challenge involves assigning continuous values ranging from -1 to 1 for valence and arousal dimensions to video frames, providing a metric for emotional state characterization.
Seven Basic Expression Classification: Participants classify video frames into one of the seven basic emotional expressions: Anger, Disgust, Fear, Happiness, Sadness, Surprise, or Neutral.
Twelve Action Unit Detection: This task requires detecting the presence of twelve facial action units, which serve as fine-grained indicators of facial movements underlying diverse emotional expressions.

For all challenges, the Aff-Wild2 database is used, which is the first large-scale dataset annotated simultaneously for all three tasks. Aff-Wild2 comprises 548 videos, with approximately 2,813,201 frames, collected from diverse YouTube videos showcasing extensive variations in emotional expressions and challenging environmental conditions. These videos are split into training, validation, and test sets to facilitate unbiased performance evaluation.

Evaluation Metrics and Baseline Model

Each task employs specific evaluation metrics to assess the performance of participating models:

Valence-Arousal Estimation: The primary metric is the Concordance Correlation Coefficient (CCC), which measures agreement between predicted and ground truth continuous labels. The mean CCC across valence and arousal dimensions is used as the performance benchmark.
Seven Basic Expression Classification: Performance is evaluated using a weighted combination of the F1 Score and Total Accuracy, with weights of 0.67 and 0.33, respectively.
Twelve Action Unit Detection: The performance metric is the average of the F1 Score and Total Accuracy over the twelve action units.

For benchmarking, a baseline model based on VGG-FACE architecture is implemented. The model utilizes pre-trained convolutional layers followed by fully connected layers tailored for each challenge task. Results on the validation sets show a CCC of 0.22 for Valence-Arousal, a weighted average score of 0.366 for Basic Expression Classification, and an average score of 0.31 for Action Unit Detection.

Implications and Future Directions

The ABAW2 competition underscores significant advancements in affective computing, especially in handling complex real-world interactions and variability in affective displays. The results obtained from the various models highlight the potential of deep neural networks in deciphering and predicting affective behaviors in naturalistic settings.

The comprehensive nature of the Aff-Wild2 dataset, alongside robust evaluation metrics, provides a solid foundation for further research and development in human-computer interaction systems that necessitate affective understanding. The findings from this competition pave the way for future work exploring multimodal affective analysis, integrating audio-visual cues, and improving model performance through enhanced training techniques and dataset augmentation strategies.

In conclusion, the ABAW2 competition demonstrates the ongoing evolution of methodologies in affective behavior analysis, reinforcing the importance of large-scale, in-the-wild datasets, and standardized evaluation protocols in driving the field towards more nuanced human emotion recognition systems.

PDF Markdown