Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HSEmotion Team at the 6th ABAW Competition: Facial Expressions, Valence-Arousal and Emotion Intensity Prediction (2403.11590v1)

Published 18 Mar 2024 in cs.CV

Abstract: This article presents our results for the sixth Affective Behavior Analysis in-the-wild (ABAW) competition. To improve the trustworthiness of facial analysis, we study the possibility of using pre-trained deep models that extract reliable emotional features without the need to fine-tune the neural networks for a downstream task. In particular, we introduce several lightweight models based on MobileViT, MobileFaceNet, EfficientNet, and DDAMFN architectures trained in multi-task scenarios to recognize facial expressions, valence, and arousal on static photos. These neural networks extract frame-level features fed into a simple classifier, e.g., linear feed-forward neural network, to predict emotion intensity, compound expressions, action units, facial expressions, and valence/arousal. Experimental results for five tasks from the sixth ABAW challenge demonstrate that our approach lets us significantly improve quality metrics on validation sets compared to existing non-ensemble techniques.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Killing two birds with one stone: Efficient and robust training of face recognition cnns by partial FC. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4042–4051, 2022.
  2. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33:12449–12460, 2020.
  3. Emotion recognition in the wild from videos using images. In Proceedings of the 18th ACM International Conference on Multimodal Interaction (ICMI), pages 433–436, 2016.
  4. Vggface2: A dataset for recognising faces across pose and age. In Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG), pages 67–74. IEEE, 2018.
  5. MobileFaceNets: Efficient CNNs for accurate real-time face verification on mobile devices. In Proceedings of the 13th Chinese Conference on Biometric Recognition (CCBR), pages 428–438. Springer, 2018.
  6. From static to dynamic: Adapting landmark-aware image models for facial expression recognition in videos. arXiv preprint arXiv:2312.05447, 2023.
  7. Deep learning-based facial emotion recognition for human–computer interaction applications. Neural Computing and Applications, 35(32):23311–23328, 2023.
  8. MobileEmotiFace: Efficient facial image representations in video-based emotion recognition on mobile devices. In Proceedings of ICPR International Workshops and Challenges on Pattern Recognition, Part V, pages 266–274. Springer, 2021.
  9. RetinaFace: Single-shot multi-level face localisation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5203–5212, 2020.
  10. Toward artificial emotional intelligence for cooperative social human–machine interaction. IEEE Transactions on Computational Social Systems, 7(1):234–246, 2019.
  11. Neural networks in video-based age and gender recognition on mobile platforms. Optical Memory and Neural Networks, 27:246–259, 2018.
  12. Dimitrios Kollias. Abaw: Valence-arousal estimation, expression recognition, action unit detection & multi-task learning challenges. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2328–2336, 2022.
  13. Dimitrios Kollias. Abaw: learning from synthetic data & multi-task learning challenges. In European Conference on Computer Vision, pages 157–172. Springer, 2023a.
  14. Dimitrios Kollias. Multi-label compound expression recognition: C-expr database & network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5589–5598, 2023b.
  15. Expression, affect, action unit recognition: Aff-wild2, multi-task learning and arcface. arXiv preprint arXiv:1910.04855, 2019.
  16. Affect analysis in-the-wild: Valence-arousal, expressions, action units and a unified framework. arXiv preprint arXiv:2103.15792, 2021a.
  17. Analysing affective behavior in the second abaw2 competition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3652–3660, 2021b.
  18. Analysing affective behavior in the first abaw 2020 competition. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)(FG), pages 794–800.
  19. Face behavior a la carte: Expressions, affect and action units in a single network. arXiv preprint arXiv:1910.11111, 2019a.
  20. Deep affect prediction in-the-wild: Aff-wild database and challenge, deep architectures, and beyond. International Journal of Computer Vision, pages 1–23, 2019b.
  21. Distribution matching for heterogeneous multi-task learning: a large-scale face study. arXiv preprint arXiv:2105.03790, 2021.
  22. Abaw: Valence-arousal estimation, expression recognition, action unit detection & emotional reaction intensity estimation challenges. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5888–5897, 2023.
  23. The 6th affective behavior analysis in-the-wild (abaw) competition. arXiv preprint arXiv:2402.19344, 2024.
  24. Multimodal feature extraction and fusion for emotional reaction intensity estimation and expression classification in videos with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5838–5844, 2023a.
  25. Emotion separation and recognition from a facial expression by generating the poker face with vision transformers. arXiv preprint arXiv:2207.11081, 2023b.
  26. Towards efficient and scalable sharpness-aware minimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12360–12370, 2022.
  27. Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172, 2019.
  28. Temporal network embedding framework with causal anonymous walks representations. PeerJ Computer Science, 8:e858, 2022.
  29. MobileViT: Light-weight, general-purpose, and mobile-friendly vision transformer. In Proceedings of the International Conference on Learning Representations (ICLR), 2021.
  30. AffectNet: A database for facial expression, valence, and arousal computing in the wild. IEEE Transactions on Affective Computing, 10(1):18–31, 2017.
  31. Using self-supervised auxiliary tasks to improve fine-grained facial representation. arXiv preprint arXiv:2105.06421, 2021.
  32. Ad lingua: Text classification improves symbolism prediction in image advertisements. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1886–1892, 2020.
  33. Andrey V. Savchenko. Video-based frame-level facial analysis of affective behavior on mobile devices using EfficientNets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 2359–2366, 2022a.
  34. Andrey V Savchenko. MT-EmotiEffNet for multi-task human affective behavior analysis and learning from synthetic data. In Proceedings of European Conference on Computer Vision (ECCV) Workshops, pages 45–59. Springer, 2022b.
  35. Andrey V Savchenko. EmotiEffNets for facial processing in video-based valence-arousal prediction, expression classification and action unit detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5715–5723, 2023.
  36. Classifying emotions and engagement in online learning based on a single facial expression recognition neural network. IEEE Transactions on Affective Computing, 13(4):2132–2143, 2022.
  37. Criterion of significance level for selection of order of spectral estimation of entropy maximum. Radioelectronics and Communications Systems, 62(5):223–231, 2019.
  38. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning (ICML), pages 6105–6114. PMLR, 2019.
  39. Ensemble spatial and temporal vision transformer for action units detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5770–5776, 2023.
  40. Facial affective behavior analysis method for 5th ABAW competition. arXiv preprint arXiv:2303.09145, 2023a.
  41. Spatial-temporal graph-based AU relationship learning for facial action unit detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5899–5907, 2023b.
  42. Distract your attention: Multi-head cross attention network for facial expression recognition. Biomimetics, 8(2):199, 2023.
  43. Local region perception and relationship learning combined with feature fusion for facial action unit detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5785–5792, 2023a.
  44. A dual branch network for emotional reaction intensity estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5811–5818, 2023b.
  45. Aff-wild: Valence and arousal ‘in-the-wild’challenge. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on, pages 1980–1987. IEEE, 2017.
  46. A dual-direction attention mixed feature network for facial expression recognition. Electronics, 12(17):3595, 2023a.
  47. Multimodal continuous emotion recognition: A technical report for abaw5. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5764–5769, 2023b.
  48. Multi-modal facial affective analysis based on masked autoencoder. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5793–5802, 2023c.
  49. ABAW5 challenge: A facial affect recognition approach utilizing transformer encoder and audiovisual fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5725–5734, 2023d.
  50. Leveraging TCN and transformer for effective visual-audio fusion in continuous emotion recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5756–5763, 2023.
Citations (15)

Summary

We haven't generated a summary for this paper yet.