HSEmotion Team at the 6th ABAW Competition: Facial Expressions, Valence-Arousal and Emotion Intensity Prediction (2403.11590v1)
Abstract: This article presents our results for the sixth Affective Behavior Analysis in-the-wild (ABAW) competition. To improve the trustworthiness of facial analysis, we study the possibility of using pre-trained deep models that extract reliable emotional features without the need to fine-tune the neural networks for a downstream task. In particular, we introduce several lightweight models based on MobileViT, MobileFaceNet, EfficientNet, and DDAMFN architectures trained in multi-task scenarios to recognize facial expressions, valence, and arousal on static photos. These neural networks extract frame-level features fed into a simple classifier, e.g., linear feed-forward neural network, to predict emotion intensity, compound expressions, action units, facial expressions, and valence/arousal. Experimental results for five tasks from the sixth ABAW challenge demonstrate that our approach lets us significantly improve quality metrics on validation sets compared to existing non-ensemble techniques.
- Killing two birds with one stone: Efficient and robust training of face recognition cnns by partial FC. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4042–4051, 2022.
- wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33:12449–12460, 2020.
- Emotion recognition in the wild from videos using images. In Proceedings of the 18th ACM International Conference on Multimodal Interaction (ICMI), pages 433–436, 2016.
- Vggface2: A dataset for recognising faces across pose and age. In Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG), pages 67–74. IEEE, 2018.
- MobileFaceNets: Efficient CNNs for accurate real-time face verification on mobile devices. In Proceedings of the 13th Chinese Conference on Biometric Recognition (CCBR), pages 428–438. Springer, 2018.
- From static to dynamic: Adapting landmark-aware image models for facial expression recognition in videos. arXiv preprint arXiv:2312.05447, 2023.
- Deep learning-based facial emotion recognition for human–computer interaction applications. Neural Computing and Applications, 35(32):23311–23328, 2023.
- MobileEmotiFace: Efficient facial image representations in video-based emotion recognition on mobile devices. In Proceedings of ICPR International Workshops and Challenges on Pattern Recognition, Part V, pages 266–274. Springer, 2021.
- RetinaFace: Single-shot multi-level face localisation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5203–5212, 2020.
- Toward artificial emotional intelligence for cooperative social human–machine interaction. IEEE Transactions on Computational Social Systems, 7(1):234–246, 2019.
- Neural networks in video-based age and gender recognition on mobile platforms. Optical Memory and Neural Networks, 27:246–259, 2018.
- Dimitrios Kollias. Abaw: Valence-arousal estimation, expression recognition, action unit detection & multi-task learning challenges. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2328–2336, 2022.
- Dimitrios Kollias. Abaw: learning from synthetic data & multi-task learning challenges. In European Conference on Computer Vision, pages 157–172. Springer, 2023a.
- Dimitrios Kollias. Multi-label compound expression recognition: C-expr database & network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5589–5598, 2023b.
- Expression, affect, action unit recognition: Aff-wild2, multi-task learning and arcface. arXiv preprint arXiv:1910.04855, 2019.
- Affect analysis in-the-wild: Valence-arousal, expressions, action units and a unified framework. arXiv preprint arXiv:2103.15792, 2021a.
- Analysing affective behavior in the second abaw2 competition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3652–3660, 2021b.
- Analysing affective behavior in the first abaw 2020 competition. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)(FG), pages 794–800.
- Face behavior a la carte: Expressions, affect and action units in a single network. arXiv preprint arXiv:1910.11111, 2019a.
- Deep affect prediction in-the-wild: Aff-wild database and challenge, deep architectures, and beyond. International Journal of Computer Vision, pages 1–23, 2019b.
- Distribution matching for heterogeneous multi-task learning: a large-scale face study. arXiv preprint arXiv:2105.03790, 2021.
- Abaw: Valence-arousal estimation, expression recognition, action unit detection & emotional reaction intensity estimation challenges. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5888–5897, 2023.
- The 6th affective behavior analysis in-the-wild (abaw) competition. arXiv preprint arXiv:2402.19344, 2024.
- Multimodal feature extraction and fusion for emotional reaction intensity estimation and expression classification in videos with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5838–5844, 2023a.
- Emotion separation and recognition from a facial expression by generating the poker face with vision transformers. arXiv preprint arXiv:2207.11081, 2023b.
- Towards efficient and scalable sharpness-aware minimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12360–12370, 2022.
- Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172, 2019.
- Temporal network embedding framework with causal anonymous walks representations. PeerJ Computer Science, 8:e858, 2022.
- MobileViT: Light-weight, general-purpose, and mobile-friendly vision transformer. In Proceedings of the International Conference on Learning Representations (ICLR), 2021.
- AffectNet: A database for facial expression, valence, and arousal computing in the wild. IEEE Transactions on Affective Computing, 10(1):18–31, 2017.
- Using self-supervised auxiliary tasks to improve fine-grained facial representation. arXiv preprint arXiv:2105.06421, 2021.
- Ad lingua: Text classification improves symbolism prediction in image advertisements. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1886–1892, 2020.
- Andrey V. Savchenko. Video-based frame-level facial analysis of affective behavior on mobile devices using EfficientNets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 2359–2366, 2022a.
- Andrey V Savchenko. MT-EmotiEffNet for multi-task human affective behavior analysis and learning from synthetic data. In Proceedings of European Conference on Computer Vision (ECCV) Workshops, pages 45–59. Springer, 2022b.
- Andrey V Savchenko. EmotiEffNets for facial processing in video-based valence-arousal prediction, expression classification and action unit detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5715–5723, 2023.
- Classifying emotions and engagement in online learning based on a single facial expression recognition neural network. IEEE Transactions on Affective Computing, 13(4):2132–2143, 2022.
- Criterion of significance level for selection of order of spectral estimation of entropy maximum. Radioelectronics and Communications Systems, 62(5):223–231, 2019.
- EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning (ICML), pages 6105–6114. PMLR, 2019.
- Ensemble spatial and temporal vision transformer for action units detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5770–5776, 2023.
- Facial affective behavior analysis method for 5th ABAW competition. arXiv preprint arXiv:2303.09145, 2023a.
- Spatial-temporal graph-based AU relationship learning for facial action unit detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5899–5907, 2023b.
- Distract your attention: Multi-head cross attention network for facial expression recognition. Biomimetics, 8(2):199, 2023.
- Local region perception and relationship learning combined with feature fusion for facial action unit detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5785–5792, 2023a.
- A dual branch network for emotional reaction intensity estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5811–5818, 2023b.
- Aff-wild: Valence and arousal ‘in-the-wild’challenge. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on, pages 1980–1987. IEEE, 2017.
- A dual-direction attention mixed feature network for facial expression recognition. Electronics, 12(17):3595, 2023a.
- Multimodal continuous emotion recognition: A technical report for abaw5. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5764–5769, 2023b.
- Multi-modal facial affective analysis based on masked autoencoder. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5793–5802, 2023c.
- ABAW5 challenge: A facial affect recognition approach utilizing transformer encoder and audiovisual fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5725–5734, 2023d.
- Leveraging TCN and transformer for effective visual-audio fusion in continuous emotion recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5756–5763, 2023.