Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation (2402.12862v1)

Published 20 Feb 2024 in cs.CL

Abstract: The subjective perception of emotion leads to inconsistent labels from human annotators. Typically, utterances lacking majority-agreed labels are excluded when training an emotion classifier, which cause problems when encountering ambiguous emotional expressions during testing. This paper investigates three methods to handle ambiguous emotion. First, we show that incorporating utterances without majority-agreed labels as an additional class in the classifier reduces the classification performance of the other emotion classes. Then, we propose detecting utterances with ambiguous emotions as out-of-domain samples by quantifying the uncertainty in emotion classification using evidential deep learning. This approach retains the classification accuracy while effectively detects ambiguous emotion expressions. Furthermore, to obtain fine-grained distinctions among ambiguous emotions, we propose representing emotion as a distribution instead of a single class label. The task is thus re-framed from classification to distribution estimation where every individual annotation is taken into account, not just the majority opinion. The evidential uncertainty measure is extended to quantify the uncertainty in emotion distribution estimation. Experimental results on the IEMOCAP and CREMA-D datasets demonstrate the superior capability of the proposed method in terms of majority class prediction, emotion distribution estimation, and uncertainty estimation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Data2vec: A general framework for self-supervised learning in speech, vision and language. In Proc. ICML, Baltimore, USA.
  2. Wav2vec 2.0: A framework for self-supervised learning of speech representations. In Proc. NeurIPS, Vancouver, Canada.
  3. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
  4. IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42:335–359.
  5. CREMA-D: Crowd-sourced emotional multimodal actors dataset. IEEE Transactions on Affective Computing, 5(4):377–390.
  6. WavLM: Large-scale self-supervised pre-training for full stack speech processing. IEEE Journal of Selected Topics in Signal Processing, 16(6):1505–1518.
  7. Leveraging label correlations in a multi-label setting: A case study in emotion. In Proc. ICASSP, Rhodes, Greece.
  8. Alan S Cowen and Dacher Keltner. 2017. Self-report captures 27 distinct categories of emotion bridged by continuous gradients. Proceedings of the national academy of sciences, 114(38):E7900–E7909.
  9. Arthur P Dempster. 1968. A generalization of Bayesian inference. Journal of the Royal Statistical Society: Series B (Methodological), 30(2):205–232.
  10. Armen Der Kiureghian and Ove Ditlevsen. 2009. Aleatory or epistemic? does it matter? Structural Safety, 31(2):105–112.
  11. Modeling subjectiveness in emotion recognition with deep neural networks: Ensembles vs soft labels. In Proc. IJCNN, Vancouver.
  12. Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proc. ICML, New York, USA.
  13. Conformer: Convolution-augmented Transformer for speech recognition. In Proc. Interspeech, Shanghai, China.
  14. On calibration of modern neural networks. In Proc. ICML, Sydney, Australia.
  15. From hard to soft: Towards more human-like emotion recognition by modelling the perception uncertainty. In Proc. ACM MM, Mountain View, USA.
  16. HuBERT: Self-supervised speech representation learning by masked prediction of hidden units. IEEE Transactions on Audio, Speech, and Language Processing, 29:3451–3460.
  17. Audun Jsang. 2018. Subjective Logic: A formalism for reasoning under uncertainty. Springer Publishing Company, Incorporated.
  18. Deep learning for robust feature generation in audiovisual emotion recognition. In Proc. ICASSP, Vancouver, Canada.
  19. Yelin Kim and Jeesun Kim. 2018. Human-like emotion recognition: Multi-label learning from noisy labeled audio-visual expressive speech. In Proc. ICASSP, Calgary, Canada.
  20. Simple and scalable predictive uncertainty estimation using deep ensembles. In Proc. NeurIPS, Long Beach, USA.
  21. Hermann G Matthies. 2007. Quantifying uncertainty: Modern computational representation of probability and applications. In Extreme Man-made and Natural Hazards in Dynamics of Structures, pages 105–135. Springer.
  22. A framework for automatic human emotion classification using emotion profiles. IEEE Transactions on Audio, Speech, and Language Processing, 19(5):1057–1070.
  23. Obtaining well calibrated probabilities using bayesian binning. In Proc. AAAI, Austin, USA.
  24. A review of affective computing: From unimodal analysis to multimodal fusion. Information Fusion, 37:98–125.
  25. Nicolae-Cătălin Ristea and Radu Tudor Ionescu. 2021. Self-paced ensemble learning for speech and audio classification. In Proc. Interspeech, Brno, Czech.
  26. Evidential deep learning to quantify classification uncertainty. In Proc. NeurIPS, Montréal, Canada.
  27. The ambiguous world of emotion representation. arXiv preprint arXiv:1909.00360.
  28. Emotion recognition by fusing time synchronous and time asynchronous representations. In Proc. ICASSP, Toronto, Canada.
  29. SUPERB: Speech Processing Universal PERformance Benchmark. In Proc. Interspeech, Brno, Czech.
  30. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In Proc. ACL, Melbourne, Australia.
  31. Google USM: Scaling automatic speech recognition beyond 100 languages. arXiv preprint arXiv:2303.01037.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Wen Wu (103 papers)
  2. Bo Li (1107 papers)
  3. Chao Zhang (907 papers)
  4. Chung-Cheng Chiu (48 papers)
  5. Qiujia Li (18 papers)
  6. Junwen Bai (20 papers)
  7. Tara N. Sainath (79 papers)
  8. Philip C. Woodland (50 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.