Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Emotional Voice Messages (EMOVOME) database: emotion recognition in spontaneous voice messages (2402.17496v2)

Published 27 Feb 2024 in cs.SD, cs.AI, cs.CL, and eess.AS

Abstract: Emotional Voice Messages (EMOVOME) is a spontaneous speech dataset containing 999 audio messages from real conversations on a messaging app from 100 Spanish speakers, gender balanced. Voice messages were produced in-the-wild conditions before participants were recruited, avoiding any conscious bias due to laboratory environment. Audios were labeled in valence and arousal dimensions by three non-experts and two experts, which were then combined to obtain a final label per dimension. The experts also provided an extra label corresponding to seven emotion categories. To set a baseline for future investigations using EMOVOME, we implemented emotion recognition models using both speech and audio transcriptions. For speech, we used the standard eGeMAPS feature set and support vector machines, obtaining 49.27% and 44.71% unweighted accuracy for valence and arousal respectively. For text, we fine-tuned a multilingual BERT model and achieved 61.15% and 47.43% unweighted accuracy for valence and arousal respectively. This database will significantly contribute to research on emotion recognition in the wild, while also providing a unique natural and freely accessible resource for Spanish.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. End-to-end speech emotion recognition: Challenges of real-life emergency call centers data recordings. In 9th International Conference on Affective Computing and Intelligent Interaction (ACII 2021), https://doi.org/10.1109/ACII52823.2021.9597419 (2021).
  2. Wu, P. et al. Automatic depression recognition by intelligent speech signal processing: A systematic survey. \JournalTitleCAAI Transactions on Intelligence Technology 8, 701–711, https://doi.org/10.1049/cit2.12113 (2023).
  3. Theories, methodologies, and effects of affect-adaptive games: A systematic review. \JournalTitleEntertainment Computing 47, 100591, https://doi.org/10.1016/j.entcom.2023.100591 (2023).
  4. Plutchik, R. The nature of emotions: Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. \JournalTitleAmerican scientist 89, 344–350 (2001).
  5. Scherer, K. R. Theory convergence in emotion science is timely and realistic. \JournalTitleCognition and Emotion 36, 154–170, https://doi.org/10.1080/02699931.2021.1973378 (2022).
  6. Iriondo, I. et al. Validation of an acoustical modelling of emotional expression in spanish using speech synthesis techniques. In ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion (2000).
  7. Analysis and modelling of emotional speech in spanish. In Proceedings of international conference on phonetic sciences, vol. 2, 957–960 (1999).
  8. Emotion recognition in non-structured utterances for human-robot interaction. In ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication, 2005., 19–23 (IEEE, 2005).
  9. Caballero-Morales, S.-O. Recognition of emotions in mexican spanish speech: An approach based on acoustic modelling of emotion-specific vowels. \JournalTitleThe Scientific World Journal 2013 (2013).
  10. Barra-Chicote, R. et al. Spanish expressive voices: Corpus for emotion research in spanish. In Proc. of LREC (Citeseer, 2008).
  11. Validating a multilingual and multimodal affective database. In International Conference on Usability and Internationalization, 422–431 (Springer, 2007).
  12. Emotional speech synthesis database elra-s0329. https://catalog.elra.info/en-us/repository/browse/ELRA-S0329/ (2011). Accessed: 2023-12-30.
  13. Emomatchspanishdb: study of speech emotion recognition machine learning models in a new spanish elicited database. \JournalTitleMultimedia Tools and Applications 1–20 (2023).
  14. Multimodal sentiment analysis of spanish online videos. \JournalTitleIEEE intelligent Systems 28, 38–45 (2013).
  15. The mexican emotional speech database (mesd): elaboration and assessment based on machine learning. In 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 1644–1647 (IEEE, 2021).
  16. Zadeh, A. et al. Cmu-moseas: A multimodal language dataset for spanish, portuguese, german and french. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing, vol. 2020, 1801 (NIH Public Access, 2020).
  17. Categorical vs dimensional perception of italian emotional speech. In Proc. Interspeech 2018, 3638–3642, https://doi.org/10.21437/Interspeech.2018-47 (2018).
  18. The composite sensing of affect. In Affect and Emotion in Human-Computer Interaction: From Theory to Applications, 104–115 (Springer, 2008).
  19. Survey on speech emotion recognition: Features, classification schemes, and databases. \JournalTitlePattern Recognition 44, 572–587, https://doi.org/10.1016/j.patcog.2010.09.020 (2011).
  20. Emotion recognition from speech: a review. \JournalTitleInternational journal of speech technology 15, 99–117 (2012).
  21. Neo five-factor inventory (neo-ffi). \JournalTitleOdessa, FL: Psychological Assessment Resources 3 (1989).
  22. Measuring emotion: the self-assessment manikin and the semantic differential. \JournalTitleJournal of behavior therapy and experimental psychiatry 25, 49–59, https://doi.org/10.1016/0005-7916(94)90063-9 (1994).
  23. Ekman, P. Basic emotions. \JournalTitleHandbook of cognition and emotion 98, 16 (1999).
  24. Gómez-Zaragozá, L. et al. Emotional voice messages (emovome) database, 10.5281/zenodo.10694370 (2024).
  25. Eyben, F. et al. The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing. \JournalTitleIEEE transactions on affective computing 7, 190–202 (2015).
  26. Opensmile: the munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM international conference on Multimedia, 1459–1462 (2010).
  27. Pedregosa, F. et al. Scikit-learn: Machine learning in Python. \JournalTitleJournal of Machine Learning Research 12, 2825–2830 (2011).
  28. Bert: Pre-training of deep bidirectional transformers for language understanding. \JournalTitlearXiv preprint arXiv:1810.04805 (2018).
  29. Speech emotion recognition from social media voice messages recorded in the wild. In HCI International 2020-Posters: 22nd International Conference, HCII 2020, Copenhagen, Denmark, July 19–24, 2020, Proceedings, Part I 22, 330–336 (Springer, 2020).
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com