Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BLSP-Emo: Towards Empathetic Large Speech-Language Models (2406.03872v1)

Published 6 Jun 2024 in cs.CL, cs.SD, and eess.AS

Abstract: The recent release of GPT-4o showcased the potential of end-to-end multimodal models, not just in terms of low latency but also in their ability to understand and generate expressive speech with rich emotions. While the details are unknown to the open research community, it likely involves significant amounts of curated data and compute, neither of which is readily accessible. In this paper, we present BLSP-Emo (Bootstrapped Language-Speech Pretraining with Emotion support), a novel approach to developing an end-to-end speech-LLM capable of understanding both semantics and emotions in speech and generate empathetic responses. BLSP-Emo utilizes existing speech recognition (ASR) and speech emotion recognition (SER) datasets through a two-stage process. The first stage focuses on semantic alignment, following recent work on pretraining speech-LLMs using ASR data. The second stage performs emotion alignment with the pretrained speech-LLM on an emotion-aware continuation task constructed from SER data. Our experiments demonstrate that the BLSP-Emo model excels in comprehending speech and delivering empathetic responses, both in instruction-following tasks and conversations.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Chen Wang (599 papers)
  2. Minpeng Liao (11 papers)
  3. Zhongqiang Huang (20 papers)
  4. Junhong Wu (10 papers)
  5. Chengqing Zong (65 papers)
  6. Jiajun Zhang (176 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com