Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unimodal Multi-Task Fusion for Emotional Mimicry Intensity Prediction (2403.11879v4)

Published 18 Mar 2024 in cs.SD, cs.AI, and eess.AS

Abstract: In this research, we introduce a novel methodology for assessing Emotional Mimicry Intensity (EMI) as part of the 6th Workshop and Competition on Affective Behavior Analysis in-the-wild. Our methodology utilises the Wav2Vec 2.0 architecture, which has been pre-trained on an extensive podcast dataset, to capture a wide array of audio features that include both linguistic and paralinguistic components. We refine our feature extraction process by employing a fusion technique that combines individual features with a global mean vector, thereby embedding a broader contextual understanding into our analysis. A key aspect of our approach is the multi-task fusion strategy that not only leverages these features but also incorporates a pre-trained Valence-Arousal-Dominance (VAD) model. This integration is designed to refine emotion intensity prediction by concurrently processing multiple emotional dimensions, thereby embedding a richer contextual understanding into our framework. For the temporal analysis of audio data, our feature fusion process utilises a Long Short-Term Memory (LSTM) network. This approach, which relies solely on the provided audio data, shows marked advancements over the existing baseline, offering a more comprehensive understanding of emotional mimicry in naturalistic settings, achieving the second place in the EMI challenge.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. D. Kollias, P. Tzirakis, M. A. Nicolaou, A. Papaioannou, G. Zhao, B. Schuller, I. Kotsia, and S. Zafeiriou, “Deep affect prediction in-the-wild: Aff-wild database and challenge, deep architectures, and beyond,” International Journal of Computer Vision, pp. 1–23, 2019.
  2. D. Kollias, V. Sharmanska, and S. Zafeiriou, “Face behavior a la carte: Expressions, affect and action units in a single network,” arXiv preprint arXiv:1910.11111, 2019.
  3. D. Kollias and S. Zafeiriou, “Expression, affect, action unit recognition: Aff-wild2, multi-task learning and arcface,” arXiv preprint arXiv:1910.04855, 2019.
  4. D. Kollias, A. Schulc, E. Hajiyev, and S. Zafeiriou, “Analysing affective behavior in the first abaw 2020 competition,” in 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)(FG), pp. 794–800.
  5. D. Kollias and S. Zafeiriou, “Affect analysis in-the-wild: Valence-arousal, expressions, action units and a unified framework,” arXiv preprint arXiv:2103.15792, 2021.
  6. D. Kollias and S. Zafeiriou, “Analysing affective behavior in the second abaw2 competition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3652–3660, 2021.
  7. D. Kollias, V. Sharmanska, and S. Zafeiriou, “Distribution matching for heterogeneous multi-task learning: a large-scale face study,” arXiv preprint arXiv:2105.03790, 2021.
  8. D. Kollias, “Abaw: Valence-arousal estimation, expression recognition, action unit detection & multi-task learning challenges,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2328–2336, 2022.
  9. D. Kollias, “Abaw: learning from synthetic data & multi-task learning challenges,” in European Conference on Computer Vision, pp. 157–172, Springer, 2023.
  10. D. Kollias, P. Tzirakis, A. Baird, A. Cowen, and S. Zafeiriou, “Abaw: Valence-arousal estimation, expression recognition, action unit detection & emotional reaction intensity estimation challenges,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5888–5897, 2023.
  11. D. Kollias, “Multi-label compound expression recognition: C-expr database & network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5589–5598, 2023.
  12. D. Kollias, P. Tzirakis, A. Cowen, S. Zafeiriou, C. Shao, and G. Hu, “The 6th affective behavior analysis in-the-wild (abaw) competition,” arXiv preprint arXiv:2402.19344, 2024.
  13. S. Zafeiriou, D. Kollias, M. A. Nicolaou, A. Papaioannou, G. Zhao, and I. Kotsia, “Aff-wild: Valence and arousal ‘in-the-wild’challenge,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on, pp. 1980–1987, IEEE, 2017.
  14. M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 9650–9660, 2021.
  15. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  16. A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A framework for self-supervised learning of speech representations,” Advances in neural information processing systems, vol. 33, pp. 12449–12460, 2020.
  17. J. Wagner, A. Triantafyllopoulos, H. Wierstorf, M. Schmitt, F. Burkhardt, F. Eyben, and B. W. Schuller, “Model for Dimensional Speech Emotion Recognition based on Wav2vec 2.0,” Feb. 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Tobias Hallmen (5 papers)
  2. Fabian Deuser (12 papers)
  3. Norbert Oswald (14 papers)
  4. Elisabeth André (65 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com