Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Beyond Silent Letters: Amplifying LLMs in Emotion Recognition with Vocal Nuances (2407.21315v3)

Published 31 Jul 2024 in cs.CL and cs.AI

Abstract: Emotion recognition in speech is a challenging multimodal task that requires understanding both verbal content and vocal nuances. This paper introduces a novel approach to emotion detection using LLMs, which have demonstrated exceptional capabilities in natural language understanding. To overcome the inherent limitation of LLMs in processing audio inputs, we propose SpeechCueLLM, a method that translates speech characteristics into natural language descriptions, allowing LLMs to perform multimodal emotion analysis via text prompts without any architectural changes. Our method is minimal yet impactful, outperforming baseline models that require structural modifications. We evaluate SpeechCueLLM on two datasets: IEMOCAP and MELD, showing significant improvements in emotion recognition accuracy, particularly for high-quality audio data. We also explore the effectiveness of various feature representations and fine-tuning strategies for different LLMs. Our experiments demonstrate that incorporating speech descriptions yields a more than 2% increase in the average weighted F1 score on IEMOCAP (from 70.111% to 72.596%).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (15)
  1. Seed-asr: Understanding diverse speech and contexts with llm-based speech recognition. Preprint, arXiv:2407.04675.
  2. Speech prefix-tuning with rnnt loss for improving llm predictions. Preprint, arXiv:2406.14701.
  3. Iemocap: interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42:335–359.
  4. AudioChatLlama: Towards general-purpose speech abilities for LLMs. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 5522–5532, Mexico City, Mexico. Association for Computational Linguistics.
  5. Yumeng Fu. 2024. Ckerc : Joint large language models with commonsense knowledge for emotion recognition in conversation. Preprint, arXiv:2403.07260.
  6. Lanser: Language-model supported speech emotion recognition. arXiv preprint arXiv:2309.03978.
  7. Instructerc: Reforming emotion recognition in conversation with a retrieval multi-task llms framework. Preprint, arXiv:2309.11911.
  8. Audio-llm: Activating the capabilities of large language models to comprehend audio data. In Advances in Neural Networks – ISNN 2024, pages 133–142, Singapore. Springer Nature Singapore.
  9. An embarrassingly simple approach for llm with strong asr capacity. Preprint, arXiv:2402.08846.
  10. Meld: A multimodal multi-party dataset for emotion recognition in conversations. Preprint, arXiv:1810.02508.
  11. Large language model-based emotional speech annotation using context and acoustic feature for speech emotion recognition. In ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 11026–11030.
  12. Secap: Speech emotion captioning with large language model. Preprint, arXiv:2312.10381.
  13. Bioserc: Integrating biography speakers supported by llms for erc tasks. Preprint, arXiv:2407.04279.
  14. When llms meets acoustic landmarks: An efficient approach to integrate speech into large language models for depression detection. Preprint, arXiv:2402.13276.
  15. Dialoguellm: Context and emotion knowledge-tuned large language models for emotion recognition in conversations. Preprint, arXiv:2310.11374.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zehui Wu (5 papers)
  2. Ziwei Gong (10 papers)
  3. Lin Ai (15 papers)
  4. Pengyuan Shi (7 papers)
  5. Kaan Donbekci (2 papers)
  6. Julia Hirschberg (37 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com