Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Emotional speech synthesis with rich and granularized control (1911.01635v2)

Published 5 Nov 2019 in eess.AS and cs.SD

Abstract: This paper proposes an effective emotion control method for an end-to-end text-to-speech (TTS) system. To flexibly control the distinct characteristic of a target emotion category, it is essential to determine embedding vectors representing the TTS input. We introduce an inter-to-intra emotional distance ratio algorithm to the embedding vectors that can minimize the distance to the target emotion category while maximizing its distance to the other emotion categories. To further enhance the expressiveness of a target speech, we also introduce an effective interpolation technique that enables the intensity of a target emotion to be gradually changed to that of neutral speech. Subjective evaluation results in terms of emotional expressiveness and controllability show the superiority of the proposed algorithm to the conventional methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Se-Yun Um (2 papers)
  2. Sangshin Oh (5 papers)
  3. Kyungguen Byun (7 papers)
  4. Inseon Jang (7 papers)
  5. Chunghyun Ahn (2 papers)
  6. Hong-Goo Kang (36 papers)
Citations (82)