Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition (1803.10132v3)

Published 27 Mar 2018 in cs.SD, cs.CL, and eess.AS

Abstract: We investigate the use of generative adversarial networks (GANs) in speech dereverberation for robust speech recognition. GANs have been recently studied for speech enhancement to remove additive noises, but there still lacks of a work to examine their ability in speech dereverberation and the advantages of using GANs have not been fully established. In this paper, we provide deep investigations in the use of GAN-based dereverberation front-end in ASR. First, we study the effectiveness of different dereverberation networks (the generator in GAN) and find that LSTM leads a significant improvement as compared with feed-forward DNN and CNN in our dataset. Second, further adding residual connections in the deep LSTMs can boost the performance as well. Finally, we find that, for the success of GAN, it is important to update the generator and the discriminator using the same mini-batch data during training. Moreover, using reverberant spectrogram as a condition to discriminator, as suggested in previous studies, may degrade the performance. In summary, our GAN-based dereverberation front-end achieves 14%-19% relative CER reduction as compared to the baseline DNN dereverberation network when tested on a strong multi-condition training acoustic model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ke Wang (531 papers)
  2. Junbo Zhang (84 papers)
  3. Sining Sun (17 papers)
  4. Yujun Wang (61 papers)
  5. Fei Xiang (9 papers)
  6. Lei Xie (339 papers)
Citations (39)

Summary

We haven't generated a summary for this paper yet.