Papers
Topics
Authors
Recent
Search
2000 character limit reached

Estimating the Uncertainty in Emotion Attributes using Deep Evidential Regression

Published 11 Jun 2023 in cs.SD, cs.CL, and eess.AS | (2306.06760v1)

Abstract: In automatic emotion recognition (AER), labels assigned by different human annotators to the same utterance are often inconsistent due to the inherent complexity of emotion and the subjectivity of perception. Though deterministic labels generated by averaging or voting are often used as the ground truth, it ignores the intrinsic uncertainty revealed by the inconsistent labels. This paper proposes a Bayesian approach, deep evidential emotion regression (DEER), to estimate the uncertainty in emotion attributes. Treating the emotion attribute labels of an utterance as samples drawn from an unknown Gaussian distribution, DEER places an utterance-specific normal-inverse gamma prior over the Gaussian likelihood and predicts its hyper-parameters using a deep neural network model. It enables a joint estimation of emotion attributes along with the aleatoric and epistemic uncertainties. AER experiments on the widely used MSP-Podcast and IEMOCAP datasets showed DEER produced state-of-the-art results for both the mean values and the distribution of emotion attributes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Ehab A AlBadawy and Yelin Kim. 2018. Joint discrete and continuous emotion prediction using ensemble and end-to-end approaches. In Proc. ICMI, Boulder.
  2. Deep evidential regression. In Proc. NeurIPS, Vancouver.
  3. Soft-target training with ambiguous emotional utterances for DNN-based speech emotion classification. In Proc. ICASSP, Brighton.
  4. Demonstrating and modelling systematic time-varying annotator disagreement in continuous emotion annotation. In Proc. Interspeech, Hyderabad.
  5. Using Gaussian processes with LSTM neural networks to predict continuous-time, dimensional emotion in ambiguous speech. In Proc. ACII, Cambridge.
  6. Bagus Tris Atmaja and Masato Akagi. 2020a. Improving valence prediction in dimensional speech emotion recognition using linguistic information. In Proc. O-COCOSDA, Yangon.
  7. Bagus Tris Atmaja and Masato Akagi. 2020b. Multitask learning and multistage fusion for dimensional audiovisual emotion recognition. In Proc. ICASSP, Conference held virtually.
  8. Bagus Tris Atmaja and Masato Akagi. 2021. Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using svm. Speech Communication, 126:9–21.
  9. Wav2Vec 2.0: A framework for self-supervised learning of speech representations. In Proc. NeurIPS, Conference held virtually.
  10. Weight uncertainty in neural network. In Proc. ICML, Lille.
  11. IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42:335–359.
  12. MSP-IMPROV: An acted corpus of dyadic interactions to study emotion perception. IEEE Transactions on Affective Computing, 8(1):67–80.
  13. WavLM: Large-scale self-supervised pre-training for full stack speech processing. IEEE Journal of Selected Topics in Signal Processing.
  14. Exploiting annotators’ typed description of emotion perception to maximize utilization of ratings for speech emotion recognition. In Proc. ICASSP, Singapore.
  15. Dynamic multi-rater Gaussian mixture regression incorporating temporal dependencies of emotion uncertainty using Kalman filters. In Proc. ICASSP, Calgary.
  16. An investigation of emotion prediction uncertainty using gaussian mixture regression. In Proc. Interspeech, Stockholm.
  17. Dealing with disagreements: Looking beyond the majority vote in subjective annotations. Transactions of the Association for Computational Linguistics, 10:92–110.
  18. Confidence measures for speech emotion recognition: A start. In Speech Communication; 10. ITG Symposium, pages 1–4. VDE.
  19. Armen Der Kiureghian and Ove Ditlevsen. 2009. Aleatory or epistemic? does it matter? Structural Safety, 31(2):105–112.
  20. Modeling subjectiveness in emotion recognition with deep neural networks: Ensembles vs soft labels. In Proc. IJCNN, Vancouver.
  21. Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proc. ICML, New York City.
  22. Sentiment-aware automatic speech recognition pre-training for enhanced speech emotion recognition. In Proc. ICASSP, Singapore.
  23. Michael Grimm and Kristian Kroschel. 2005. Evaluation of natural emotions using self assessment manikins. In Proc. ASRU, Cancun.
  24. Primitives-based evaluation and estimation of emotions in speech. Speech Communication, 49(10-11):787–800.
  25. Emotion representation, analysis and synthesis in continuous space: A survey. In Proc. FG, Santa Barbara.
  26. Exploring perception uncertainty for emotion recognition in dyadic conversation and music listening. Cognitive Computation, 13(2):231–240.
  27. From hard to soft: Towards more human-like emotion recognition by modelling the perception uncertainty. In Proc. ACM MM, Mountain View.
  28. Transformer-based label set generation for multi-modal multi-label emotion detection. In Proc. ACM MM, Seattle.
  29. Alex Kendall and Yarin Gal. 2017. What uncertainties do we need in bayesian deep learning for computer vision? In Proc. NeurIPS, Long Beach.
  30. SEWA DB: A rich database for audio-visual emotion and sentiment research in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(3):1022–1040.
  31. Discretized continuous speech emotion recognition with multi-task deep recurrent neural network. In Proc. Interspeech, Stockholm.
  32. Not all features are equal: Selection of robust features for speech emotion recognition in noisy environments. In Proc. ICASSP, Singapore.
  33. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.
  34. R. Lotfian and C. Busso. 2019. Building naturalistic emotionally balanced speech corpus by retrieving emotional speech from existing podcast recordings. IEEE Transactions on Affective Computing, 10(4):471–483.
  35. Multimodal sentiment analysis using hierarchical fusion with context modeling. Knowledge-Based Systems, 161:124–133.
  36. Andrey Malinin and Mark Gales. 2018. Predictive uncertainty estimation via prior networks. In Proc. NeurIPS, Montréal.
  37. Dynamic speech emotion recognition with state-space models. In Proc. EUSIPCO, Nice.
  38. Hermann G Matthies. 2007. Quantifying uncertainty: Modern computational representation of probability and applications. In Extreme man-made and natural hazards in dynamics of structures, pages 105–135. Springer.
  39. Speech Emotion: Investigating Model Representations, Multi-Task Learning and Knowledge Distillation. In Proc. Interspeech, Incheon.
  40. Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Transactions on Affective Computing, 2(2):92–105.
  41. Librispeech: An ASR corpus based on public domain audio books. In Proc. ICASSP, South Brisbane.
  42. Robert Plutchik. 2001. The nature of emotions: Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. American Scientist, 89(4):344–350.
  43. MELD: A multimodal multi-party dataset for emotion recognition in conversations. In Proc. ACL, Florence.
  44. Multimodal sentiment analysis: Addressing key issues and setting up the baselines. IEEE Intelligent Systems, 33(6):17–25.
  45. Multimodal emotion recognition for avec 2016 challenge. In Proc. ACM MM, Amsterdam.
  46. End-to-end label uncertainty modeling for speech emotion recognition using bayesian neural networks. arXiv preprint arXiv:2110.03299.
  47. SpeechBrain: A general-purpose speech toolkit. ArXiv:2106.04624.
  48. AVEC 2015: The 5th international audio/visual emotion challenge and workshop. In Proc. ACM MM, Brisbane.
  49. AVEC 2017: Real-life depression, and affect recognition workshop and challenge. In Proc. ACM MM, Mountain View.
  50. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In Proc. FG, Shanghai.
  51. James A Russell. 1980. A circumplex model of affect. Journal of Personality and Social Psychology, 39(6):1161.
  52. James A Russell and Albert Mehrabian. 1977. Evidence for a three-factor theory of emotions. Journal of Research in Personality, 11(3):273–294.
  53. Harold Schlosberg. 1954. Three dimensions of emotion. Psychological Review, 61(2):81.
  54. Evidential deep learning to quantify classification uncertainty. In Proc. NeurIPS, Montréal.
  55. Kusha Sridhar and Carlos Busso. 2020a. Ensemble of students taught by probabilistic teachers to improve speech emotion recognition. In Proc. Interspeech, Shanghai.
  56. Kusha Sridhar and Carlos Busso. 2020b. Modeling uncertainty in predicting emotional attributes from spontaneous speech. In Proc. ICASSP, Conference held virtually.
  57. Generative approach using soft-labels to learn uncertainty in predicting emotional attributes. In Proc. ACII, Chicago.
  58. Representation learning through cross-modal conditional teacher-student training for speech emotion recognition. In Proc. ICASSP, Singapore.
  59. Probing speech emotion recognition transformers for linguistic knowledge. In Proc. Interspeech, Incheon.
  60. Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network. In Proc. ICASSP, Shanghai.
  61. SUPERB: Speech processing universal performance benchmark. In Proc. Interspeech, Brno.
  62. A novel sequential Monte Carlo framework for predicting ambiguous emotion states. In Proc. ICASSP, Singapore.
  63. Emotion recognition by fusing time synchronous and time asynchronous representations. In Proc. ICASSP, Toronto.
  64. Estimating the uncertainty in emotion class labels with utterance-specific dirichlet priors. IEEE Transactions on Affective Computing, Early access.
  65. Multi-modal multi-label emotion detection with modality and label dependence. In Proc. EMNLP, Conference held virtually.
Citations (10)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.