Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Attention-Guided Generative Adversarial Network for Whisper to Normal Speech Conversion (2111.01342v1)

Published 2 Nov 2021 in cs.SD, cs.HC, and eess.AS

Abstract: Whispered speech is a special way of pronunciation without using vocal cord vibration. A whispered speech does not contain a fundamental frequency, and its energy is about 20dB lower than that of a normal speech. Converting a whispered speech into a normal speech can improve speech quality and intelligibility. In this paper, a novel attention-guided generative adversarial network model incorporating an autoencoder, a Siamese neural network, and an identity mapping loss function for whisper to normal speech conversion (AGAN-W2SC) is proposed. The proposed method avoids the challenge of estimating the fundamental frequency of the normal voiced speech converted from a whispered speech. Specifically, the proposed model is more amendable to practical applications because it does not need to align speech features for training. Experimental results demonstrate that the proposed AGAN-W2SC can obtain improved speech quality and intelligibility compared with dynamic-time-warping-based methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Teng Gao (7 papers)
  2. Jian Zhou (262 papers)
  3. Huabin Wang (4 papers)
  4. Liang Tao (9 papers)
  5. Hon Keung Kwan (5 papers)
Citations (2)