Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SEGAA: A Unified Approach to Predicting Age, Gender, and Emotion in Speech (2403.00887v1)

Published 1 Mar 2024 in eess.AS, cs.AI, cs.CL, cs.LG, and cs.SD

Abstract: The interpretation of human voices holds importance across various applications. This study ventures into predicting age, gender, and emotion from vocal cues, a field with vast applications. Voice analysis tech advancements span domains, from improving customer interactions to enhancing healthcare and retail experiences. Discerning emotions aids mental health, while age and gender detection are vital in various contexts. Exploring deep learning models for these predictions involves comparing single, multi-output, and sequential models highlighted in this paper. Sourcing suitable data posed challenges, resulting in the amalgamation of the CREMA-D and EMO-DB datasets. Prior work showed promise in individual predictions, but limited research considered all three variables simultaneously. This paper identifies flaws in an individual model approach and advocates for our novel multi-output learning architecture Speech-based Emotion Gender and Age Analysis (SEGAA) model. The experiments suggest that Multi-output models perform comparably to individual models, efficiently capturing the intricate relationships between variables and speech inputs, all while achieving improved runtime.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Assessing the Role of Age, Education, Gender and Income on the Digital Divide: Evidence for the European Union. Information Systems Frontiers, 23(4):1007–1021, August 2021. ISSN 1387-3326, 1572-9419. doi:10.1007/s10796-020-10012-9. URL https://link.springer.com/10.1007/s10796-020-10012-9.
  2. Behavioral and emotional disorders in children during the COVID-19 epidemic. The Journal of pediatrics, 221:264–266, 2020. URL https://www.jpeds.com/article/S0022-3476(20)30336-X/abstract. Publisher: Elsevier.
  3. Identification of Common Neural Circuit Disruptions in Emotional Processing Across Psychiatric Disorders. American Journal of Psychiatry, 177(5):411–421, May 2020. ISSN 0002-953X, 1535-7228. doi:10.1176/appi.ajp.2019.18111271. URL http://ajp.psychiatryonline.org/doi/10.1176/appi.ajp.2019.18111271.
  4. Building consumer loyalty through e-shopping experiences: The mediating role of emotions. Journal of Retailing and Consumer Services, 60:102481, 2021. URL https://www.sciencedirect.com/science/article/pii/S0969698921000473. Publisher: Elsevier.
  5. Mental health needs among lesbian, gay, bisexual, and transgender college students during the COVID-19 pandemic. Journal of Adolescent Health, 67(5):645–648, 2020. URL https://www.sciencedirect.com/science/article/pii/S1054139X20304882. Publisher: Elsevier.
  6. E-Commerce and the Factors Affecting Its Development in the Age of Digital Technology: Empirical Evidence at EU–27 Level. Sustainability 2022, 14, 101, 2021. URL https://www.academia.edu/download/79805943/pdf.pdf.
  7. IEMOCAP: interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42(4):335–359, December 2008. ISSN 1574-020X, 1574-0218. doi:10.1007/s10579-008-9076-6. URL http://link.springer.com/10.1007/s10579-008-9076-6.
  8. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PloS one, 13(5):e0196391, 2018. URL https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0196391&fbclid=IwAR0pMF9vaxEvCucqjm2DJ1TH6CUv7JpBD79vi8qJcCAdHzJjJ4X2pFGDv_E. Publisher: Public Library of Science.
  9. Toronto emotional speech set (TESS). University of Toronto, Psychology Department, 2010.
  10. Design, recording and verification of a Danish emotional speech database. In Fifth European conference on speech communication and technology, 1997.
  11. Common Voice: A Massively-Multilingual Speech Corpus, March 2020. URL http://arxiv.org/abs/1912.06670. arXiv:1912.06670 [cs].
  12. A database of German emotional speech. In Interspeech, volume 5, pages 1517–1520, 2005. URL https://www.researchgate.net/profile/Felix-Burkhardt-2/publication/221491017_A_database_of_German_emotional_speech/links/00b7d5226f45d66e38000000/A-database-of-German-emotional-speech.pdf.
  13. Crema-d: Crowd-sourced emotional multimodal actors dataset. IEEE transactions on affective computing, 5(4):377–390, 2014. URL https://ieeexplore.ieee.org/abstract/document/6849440/. Publisher: IEEE.
  14. An ensemble 1D-CNN-LSTM-GRU model with data augmentation for speech emotion recognition. Expert Systems with Applications, 218:119633, 2023. URL https://www.sciencedirect.com/science/article/pii/S0957417423001343. Publisher: Elsevier.
  15. Age and gender recognition using a convolutional neural network with a specially designed multi-attention module through speech spectrograms. Sensors, 21(17):5892, 2021. URL https://www.mdpi.com/1424-8220/21/17/5892. Publisher: MDPI.
  16. Copypaste: An augmentation method for speech emotion recognition. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6324–6328. IEEE, 2021. URL https://ieeexplore.ieee.org/abstract/document/9415077/.
  17. Multilanguage Speech-Based Gender Classification Using Time-Frequency Features and SVM Classifier. In Jessnor Arif Mat Jizat, Ismail Mohd Khairuddin, Mohd Azraai Mohd Razman, Ahmad Fakhri Ab. Nasir, Mohamad Shaiful Abdul Karim, Abdul Aziz Jaafar, Lim Wei Hong, Anwar P. P. Abdul Majeed, Pengcheng Liu, Hyun Myung, Han-Lim Choi, and Gian-Antonio Susto, editors, Advances in Robotics, Automation and Data Analytics, volume 1350, pages 1–10. Springer International Publishing, Cham, 2021. ISBN 978-3-030-70916-7 978-3-030-70917-4. doi:10.1007/978-3-030-70917-4_1. URL http://link.springer.com/10.1007/978-3-030-70917-4_1. Series Title: Advances in Intelligent Systems and Computing.
  18. Gender and age group predictions from speech features using multi-layer perceptron model. In 2020 IEEE 17th India Council International Conference (INDICON), pages 1–6. IEEE, 2020. URL https://ieeexplore.ieee.org/abstract/document/9342434/.
  19. One source to detect them all: gender, age, and emotion detection from voice. In 2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC), pages 338–343. IEEE, 2021. URL https://ieeexplore.ieee.org/abstract/document/9529731/.
  20. Machine Learning for Multi-Output Regression: When should a holistic multivariate approach be preferred over separate univariate ones?, January 2022. URL http://arxiv.org/abs/2201.05340. arXiv:2201.05340 [cs, stat].
Citations (3)

Summary

We haven't generated a summary for this paper yet.