HSE-NN Team at the 4th ABAW Competition: Multi-task Emotion Recognition and Learning from Synthetic Images (2207.09508v3)

Published 19 Jul 2022 in cs.CV

Abstract: In this paper, we present the results of the HSE-NN team in the 4th competition on Affective Behavior Analysis in-the-wild (ABAW). The novel multi-task EfficientNet model is trained for simultaneous recognition of facial expressions and prediction of valence and arousal on static photos. The resulting MT-EmotiEffNet extracts visual features that are fed into simple feed-forward neural networks in the multi-task learning challenge. We obtain performance measure 1.3 on the validation set, which is significantly greater when compared to either performance of baseline (0.3) or existing models that are trained only on the s-Aff-Wild2 database. In the learning from synthetic data challenge, the quality of the original synthetic training set is increased by using the super-resolution techniques, such as Real-ESRGAN. Next, the MT-EmotiEffNet is fine-tuned on the new training set. The final prediction is a simple blending ensemble of pre-trained and fine-tuned MT-EmotiEffNets. Our average validation F1 score is 18% greater than the baseline convolutional neural network.

Authors (1)

Andrey V. Savchenko (17 papers)

Citations (7)

View on Semantic Scholar

Summary

The paper presents a novel multi-task MT-EmotiEffNet that integrates facial expression, valence, and arousal predictions.
It enhances performance with a modified EfficientNet-B0, Real-ESRGAN super-resolution, and an ensemble strategy that boosts F1 scores by 18%.
The study secured top competition positions, underscoring its impact on advancing affective computing using diverse datasets.

Overview of the HSE-NN Team's Participation in the 4th ABAW Competition

This paper presents the methodologies and results achieved by the HSE-NN team in the 4th Affective Behavior Analysis in-the-Wild (ABAW) competition. The focus is on developing a multi-task model for emotion recognition leveraging both real-world and synthetic images. The core of their approach involves a modified EfficientNet model, referred to as MT-EmotiEffNet, which simultaneously addresses facial expression recognition, and prediction of valence and arousal from static photos.

Methodological Insights

Multi-Task Model Development: The MT-EmotiEffNet is central to their approach. It builds on the EfficientNet-B0 architecture and integrates multi-task learning capabilities for facial expression and affective state prediction. This model extracts visual features that serve as inputs for a simple feed-forward neural network designed to handle multiple learning tasks concurrently.
Training and Performance: The model demonstrated a significant improvement in performance, achieving a performance measure of 1.3 on the validation set. This represents a substantial enhancement over the baseline of 0.3, as well as over models trained solely on the s-Aff-Wild2 database. The authors also incorporated techniques like Real-ESRGAN for image super-resolution to enhance the synthetic dataset.
Ensemble Strategy: A blending ensemble strategy was employed, combining outputs from pre-trained and fine-tuned versions of the MT-EmotiEffNet. This approach yielded an 18% improvement in the average validation F1 score over baseline CNNs.

Numerical Results and Competition Outcome

The HSE-NN team secured first place in the learning from synthetic data challenge and third place in the multi-task learning challenge at the competition. The superior performance is attributed to the effective training of MT-EmotiEffNet on various datasets, including AffectNet and enhanced synthetic images, and the strategic use of ensemble methods.

Implications and Future Directions

The research illustrates the potential of multi-task models like MT-EmotiEffNet in advancing emotion recognition technology. By addressing the limitations of existing emotional datasets and employing techniques to enhance synthetic data, the paper points toward more robust affective computing applications capable of operating in heterogeneous environments. Future developments could further explore the integration of additional modalities, such as audio-visual data, and refine learning algorithms for even greater accuracy and generalization in real-world scenarios.

These advancements contribute significantly to the field of affective computing, providing insights and methodologies beneficial for both academic research and practical applications in emotion-aware systems.

PDF Markdown

Related Papers

GitHub

GitHub - HSE-asavchenko/face-emotion-recognition: Efficient face emotion recognition in photos and videos (602 stars)