E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models (2401.00475v3)
Abstract: This study focuses on emotion-sensitive spoken dialogue in human-machine speech interaction. With the advancement of LLMs, dialogue systems can handle multimodal data, including audio. Recent models have enhanced the understanding of complex audio signals through the integration of various audio events. However, they are unable to generate appropriate responses based on emotional speech. To address this, we introduce the Emotional chat Model (E-chat), a novel spoken dialogue system capable of comprehending and responding to emotions conveyed from speech. This model leverages an emotion embedding extracted by a speech encoder, combined with LLMs, enabling it to respond according to different emotional contexts. Additionally, we introduce the E-chat200 dataset, designed explicitly for emotion-sensitive spoken dialogue. In various evaluation metrics, E-chat consistently outperforms baseline model, demonstrating its potential in emotional comprehension and human-machine interaction.
- Hongfei Xue (22 papers)
- Yuhao Liang (10 papers)
- Bingshen Mu (8 papers)
- Shiliang Zhang (132 papers)
- Mengzhe Chen (6 papers)
- Qian Chen (264 papers)
- Lei Xie (337 papers)