Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems (2206.15067v2)

Published 30 Jun 2022 in cs.SD and eess.AS

Abstract: This paper proposes an effective emotional text-to-speech (TTS) system with a pre-trained LLM (LM)-based emotion prediction method. Unlike conventional systems that require auxiliary inputs such as manually defined emotion classes, our system directly estimates emotion-related attributes from the input text. Specifically, we utilize generative pre-trained transformer (GPT)-3 to jointly predict both an emotion class and its strength in representing emotions coarse and fine properties, respectively. Then, these attributes are combined in the emotional embedding space and used as conditional features of the TTS model for generating output speech signals. Consequently, the proposed system can produce emotional speech only from text without any auxiliary inputs. Furthermore, because the GPT-3 enables to capture emotional context among the consecutive sentences, the proposed method can effectively handle the paragraph-level generation of emotional speech.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Hyun-Wook Yoon (7 papers)
  2. Ohsung Kwon (8 papers)
  3. Hoyeon Lee (5 papers)
  4. Ryuichi Yamamoto (34 papers)
  5. Eunwoo Song (19 papers)
  6. Jae-Min Kim (13 papers)
  7. Min-Jae Hwang (13 papers)
Citations (12)