Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
12 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
37 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Investigating salient representations and label Variance in Dimensional Speech Emotion Analysis (2312.16180v1)

Published 17 Dec 2023 in cs.SD, cs.AI, cs.CL, and cs.LG

Abstract: Representations derived from models such as BERT (Bidirectional Encoder Representations from Transformers) and HuBERT (Hidden units BERT), have helped to achieve state-of-the-art performance in dimensional speech emotion recognition. Despite their large dimensionality, and even though these representations are not tailored for emotion recognition tasks, they are frequently used to train large speech emotion models with high memory and computational costs. In this work, we show that there exist lower-dimensional subspaces within the these pre-trained representational spaces that offer a reduction in downstream model complexity without sacrificing performance on emotion estimation. In addition, we model label uncertainty in the form of grader opinion variance, and demonstrate that such information can improve the models generalization capacity and robustness. Finally, we compare the robustness of the emotion models against acoustic degradations and observed that the reduced dimensional representations were able to retain the performance similar to the full-dimensional representations without significant regression in dimensional emotion performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. “Leveraging acoustic cues and paralinguistic embeddings to detect expression from voice,” Proc. Interspeech, pp. 1651–1655, 2019.
  2. “Detecting emotion primitives from speech and their use in discerning categorical emotions,” in Proc. of ICASSP. IEEE, 2020, pp. 7164–7168.
  3. B. Desmet and V. Hoste, “Emotion detection in suicide notes,” Expert Systems with Applications, vol. 40, no. 16, pp. 6351–6358, 2013.
  4. “An investigation of emotional speech in depression classification.,” in Proc. of Interspeech, 2016, pp. 485–489.
  5. P. Ekman, “An argument for basic emotions,” Cognition & emotion, vol. 6, no. 3-4, pp. 169–200, 1992.
  6. A.S. Cowen and D. Keltner, “Self-report captures 27 distinct categories of emotion bridged by continuous gradients,” Proc. of the National Academy of Sciences, vol. 114, no. 38, pp. E7900–E7909, 2017.
  7. “The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology,” Development and psychopathology, vol. 17, no. 3, pp. 715–734, 2005.
  8. “Iemocap: Interactive emotional dyadic motion capture database,” Language resources and evaluation, vol. 42, no. 4, pp. 335–359, 2008.
  9. “Multimodal databases of everyday emotion: Facing up to complexity,” in Proc. of Interspeech, 2005.
  10. “Building a naturalistic emotional speech corpus by retrieving expressive behaviors from existing speech corpora,” in Proc. of Interspeech, 2014.
  11. “Label uncertainty modeling and prediction for speech emotion recognition using t-distributions,” in Proc. of ACII. IEEE, 2022, pp. 1–8.
  12. “Multi-modal learning for speech emotion recognition: An analysis and comparison of asr outputs with ground truth transcription.,” in Proc. of Interspeech, 2019, pp. 3302–3306.
  13. “Sentiment-aware automatic speech recognition pre-training for enhanced speech emotion recognition,” Proc. of ICASSP, pp. 7347–7351, 2022.
  14. “Representation learning through cross-modal conditional teacher-student training for speech emotion recognition,” pp. 6442–6446, 2022.
  15. “Jointly fine-tuning ”BERT-like” self supervised models to improve multimodal speech emotion recognition,” arXiv preprint arXiv:2008.06682, 2020.
  16. “Speech emotion: Investigating model representations, multi-task learning and knowledge distillation,” Proc. of Interspeech, 2022.
  17. “Pre-trained model representations and their robustness against noise for speech emotion analysis,” in Proc. of ICASSP. IEEE, 2023, pp. 1–5.
  18. R. Lotfian and C. Busso, “Building naturalistic emotionally balanced speech corpus by retrieving emotional speech from existing podcast recordings,” IEEE Trans. on Affective Computing, vol. 10, no. 4, pp. 471–483, 2017.
  19. “HuBERT: Self-supervised speech representation learning by masked prediction of hidden units,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 3451–3460, 2021.
  20. “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. of NAACL-HLT, 2019, pp. 4171–4186.
  21. V. Mitra and H. Franco, “Investigation and analysis of hyper and hypo neuron pruning to selectively update neurons during unsupervised adaptation,” Digital Signal Processing, vol. 99, pp. 102655, 2020.
  22. I. Lawrence and K. Lin, “A concordance correlation coefficient to evaluate reproducibility,” Biometrics, pp. 255–268, 1989.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com