Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Audio-Visual Compound Expression Recognition Method based on Late Modality Fusion and Rule-based Decision (2403.12687v2)

Published 19 Mar 2024 in cs.CV and cs.LG

Abstract: This paper presents the results of the SUN team for the Compound Expressions Recognition Challenge of the 6th ABAW Competition. We propose a novel audio-visual method for compound expression recognition. Our method relies on emotion recognition models that fuse modalities at the emotion probability level, while decisions regarding the prediction of compound expressions are based on predefined rules. Notably, our method does not use any training data specific to the target task. Thus, the problem is a zero-shot classification task. The method is evaluated in multi-corpus training and cross-corpus validation setups. Using our proposed method is achieved an F1-score value equals to 22.01% on the C-EXPR-DB test subset. Our findings from the challenge demonstrate that the proposed method can potentially form a basis for developing intelligent tools for annotating audio-visual data in the context of human's basic and compound emotions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation, 42:335–359, 2008.
  2. CREMA-D: Crowd-sourced emotional multimodal actors dataset. IEEE Transactions on Affective Computing, 5(4):377–390, 2014.
  3. Retinaface: Single-shot multi-level face localisation in the wild. In CVPR, pages 5203–5212, 2020.
  4. Abhinav Dhall. Emotiw 2019: Automatic emotion, engagement and cohesion prediction tasks. In International Commission on Mathematical Instruction (ICMI), pages 546–550, 2019.
  5. Audio-visual feature selection and reduction for emotion classification. In Int. Conf. on Auditory-Visual Speech Processing (AVSP), pages 185–190, 2008.
  6. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
  7. Dimitrios Kollias. ABAW: Valence-arousal estimation, expression recognition, action unit detection & multi-task learning challenges. In CVPR, pages 2328–2336, 2022.
  8. Dimitrios Kollias. ABAW: Learning from synthetic data & multi-task learning challenges. In ECCV, pages 157–172, 2023a.
  9. Dimitrios Kollias. Multi-label compound expression recognition: C-EXPR database & network. In CVPR, pages 5589–5598, 2023b.
  10. Expression, affect, action unit recognition: Aff-Wild2, multi-task learning and arcface. arXiv preprint arXiv:1910.04855, 2019.
  11. Affect analysis in-the-wild: Valence-arousal, expressions, action units and a unified framework. arXiv preprint arXiv:2103.15792, 2021a.
  12. Analysing affective behavior in the second ABAW2 competition. In CVPR, pages 3652–3660, 2021b.
  13. Face behavior a la carte: Expressions, affect and action units in a single network. arXiv preprint arXiv:1910.11111, 2019a.
  14. Deep affect prediction in-the-wild: Aff-wild database and challenge, deep architectures, and beyond. IJCV, pages 1–23, 2019b.
  15. Analysing affective behavior in the first abaw 2020 competition. In International Conference on Automatic Face and Gesture Recognition (FG), pages 794–800, 2020.
  16. Distribution matching for heterogeneous multi-task learning: a large-scale face study. arXiv preprint arXiv:2105.03790, 2021.
  17. ABAW: Valence-arousal estimation, expression recognition, action unit detection & emotional reaction intensity estimation challenges. In CVPR, pages 5888–5897, 2023.
  18. The 6th affective behavior analysis in-the-wild (ABAW) competition. arXiv preprint arXiv:2402.19344, 2024.
  19. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In CVPR, pages 2852–2861, 2017.
  20. Compound expression recognition in-the-wild with AU-assisted meta multi-task learning. In CVPR, pages 5734–5743, 2023.
  21. MAFW: A large-scale, multi-modal, compound affective database for dynamic facial expression recognition in the wild. In ACM MM, pages 24–32, 2022.
  22. The ryerson audio-visual database of emotional speech and song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in north american english. PloS one, 13(5):e0196391, 2018.
  23. SGDR: stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, pages 1–16, 2017.
  24. Building naturalistic emotionally balanced speech corpus by retrieving emotional speech from existing podcast recordings. IEEE Transactions on Affective Computing, 10(4):471–483, 2017.
  25. Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172, pages 1–9, 2019.
  26. Does label smoothing mitigate label noise? In International Conference on Machine Learning, pages 6448–6458, 2020.
  27. AffectNet: A database for facial expression, valence, and arousal computing in the wild. IEEE Transactions on Affective Computing, 10(1):18–31, 2017.
  28. RAMAS: Russian multimodal corpus of dyadic interaction for affective computing. In International Conference on Speech and Computer, pages 501–510, 2018.
  29. MELD: A multimodal multi-party dataset for emotion recognition in conversations. In Annual Meeting of the Association for Computational Linguistics (ACL), pages 527–536, 2019.
  30. In search of a robust facial expressions recognition model: A large-scale visual cross-corpus study. Neurocomputing, 514:435–450, 2022.
  31. MAE-DFER: Efficient masked autoencoder for self-supervised dynamic facial expression recognition. In ACM MM, pages 6110–6121, 2023.
  32. HiCMAE: Hierarchical contrastive masked autoencoder for self-supervised audio-visual emotion recognition. arXiv preprint arXiv:2401.05698, 2024.
  33. Dawn of the transformer era in speech emotion recognition: closing the valence gap. IEEE TPAMI, pages 1–13, 2023.
  34. Aff-wild: Valence and arousal ‘in-the-wild’challenge. In CVPRW, pages 1980–1987, 2017.
  35. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
Citations (6)

Summary

We haven't generated a summary for this paper yet.