Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Emotion-Aware Contrastive Adaptation Network for Source-Free Cross-Corpus Speech Emotion Recognition (2401.12925v1)

Published 23 Jan 2024 in cs.SD and eess.AS

Abstract: Cross-corpus speech emotion recognition (SER) aims to transfer emotional knowledge from a labeled source corpus to an unlabeled corpus. However, prior methods require access to source data during adaptation, which is unattainable in real-life scenarios due to data privacy protection concerns. This paper tackles a more practical task, namely source-free cross-corpus SER, where a pre-trained source model is adapted to the target domain without access to source data. To address the problem, we propose a novel method called emotion-aware contrastive adaptation network (ECAN). The core idea is to capture local neighborhood information between samples while considering the global class-level adaptation. Specifically, we propose a nearest neighbor contrastive learning to promote local emotion consistency among features of highly similar samples. Furthermore, relying solely on nearest neighborhoods may lead to ambiguous boundaries between clusters. Thus, we incorporate supervised contrastive learning to encourage greater separation between clusters representing different emotions, thereby facilitating improved class-level adaptation. Extensive experiments indicate that our proposed ECAN significantly outperforms state-of-the-art methods under the source-free cross-corpus SER setting on several speech emotion corpora.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. “Dawn of the transformer era in speech emotion recognition: Closing the valence gap,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–13, 2023.
  2. “Domain invariant feature learning for speaker-independent speech emotion recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 2217–2230, 2022.
  3. “Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers,” Speech Communication, vol. 116, pp. 56–76, 2020.
  4. “Speech emotion recognition via an attentive time–frequency neural network,” IEEE Transactions on Computational Social Systems, 2022.
  5. “Cross-corpus speech emotion recognition based on few-shot learning and domain adaptation,” IEEE Signal Processing Letters, vol. 28, pp. 1190–1194, 2021.
  6. “Deep implicit distribution alignment networks for cross-corpus speech emotion recognition,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5.
  7. “Deep transductive transfer regression network for cross-corpus speech emotion recognition,” Proceedings of the INTERSPEECH, Incheon, Korea, pp. 18–22, 2022.
  8. “Cross-corpus speech emotion recognition using joint distribution adaptive regression,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 3790–3794.
  9. “Self supervised adversarial domain adaptation for cross-corpus and cross-language speech emotion recognition,” IEEE Transactions on Affective Computing, pp. 1–1, 2022.
  10. “Unsupervised cross-corpus speech emotion recognition using a multi-source cycle-gan,” IEEE Transactions on Affective Computing, pp. 1–1, 2022.
  11. “Seeing voices and hearing faces: Cross-modal biometric matching,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8427–8436.
  12. “Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation,” in International conference on machine learning, 2020, pp. 6028–6039.
  13. “Exploiting the intrinsic neighborhood structure for source-free domain adaptation,” Advances in neural information processing systems, vol. 34, pp. 29393–29405, 2021.
  14. “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
  15. “Rényi divergence and kullback-leibler divergence,” IEEE Transactions on Information Theory, vol. 60, no. 7, pp. 3797–3820, 2014.
  16. “Emovo corpus: an italian emotional speech database,” in Proceedings of the ninth international conference on language resources and evaluation (LREC’14). European Language Resources Association (ELRA), 2014, pp. 3501–3504.
  17. “A database of german emotional speech,” in INTERSPEECH, 2005, pp. 1517–1520.
  18. “The enterface’05 audio-visual emotion database,” in ICDE Workshops, 2006, p. 8.
  19. “Design of speech corpus for mandarin text to speech,” in The Blizzard Challenge 2008 Workshop, 2008, pp. 1–4.
  20. “Learning transferable features with deep adaptation networks,” in International conference on machine learning. PMLR, 2015, pp. 97–105.
  21. “Generalized source-free domain adaptation,” in 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021. 2021, pp. 8958–8967, IEEE.
  22. “Confidence score for source-free unsupervised domain adaptation,” in International Conference on Machine Learning, ICML 2022, vol. 162 of Proceedings of Machine Learning Research, pp. 12365–12377.
  23. “Uncertainty-guided source-free domain adaptation,” in Computer Vision - ECCV 2022 - 17th European Conference. 2022, vol. 13685 of Lecture Notes in Computer Science, pp. 537–555, Springer.
  24. “Divide and contrast: Source-free domain adaptation via adaptive contrastive learning,” in NeurIPS, 2022.
  25. “Attracting and dispersing: A simple approach for source-free domain adaptation,” in NeurIPS, 2022.
  26. “Very deep convolutional networks for large-scale image recognition,” in 3rd International Conference on Learning Representations, ICLR 2015, 2015.
  27. Laurens Van der Maaten and Geoffrey Hinton, “Visualizing data using t-sne.,” Journal of machine learning research, vol. 9, no. 11, 2008.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yan Zhao (120 papers)
  2. Jincen Wang (3 papers)
  3. Cheng Lu (70 papers)
  4. Sunan Li (5 papers)
  5. Björn Schuller (83 papers)
  6. Yuan Zong (28 papers)
  7. Wenming Zheng (41 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.