Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS (2210.17349v3)

Published 31 Oct 2022 in cs.SD and eess.AS

Abstract: In current two-stage neural text-to-speech (TTS) paradigm, it is ideal to have a universal neural vocoder, once trained, which is robust to imperfect mel-spectrogram predicted from the acoustic model. To this end, we propose Robust MelGAN vocoder by solving the original multi-band MelGAN's metallic sound problem and increasing its generalization ability. Specifically, we introduce a fine-grained network dropout strategy to the generator. With a specifically designed over-smooth handler which separates speech signal intro periodic and aperiodic components, we only perform network dropout to the aperodic components, which alleviates metallic sounding and maintains good speaker similarity. To further improve generalization ability, we introduce several data augmentation methods to augment fake data in the discriminator, including harmonic shift, harmonic noise and phase noise. Experiments show that Robust MelGAN can be used as a universal vocoder, significantly improving sound quality in TTS systems built on various types of data.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Kun Song (30 papers)
  2. Jian Cong (16 papers)
  3. Xinsheng Wang (33 papers)
  4. Yongmao Zhang (16 papers)
  5. Lei Xie (337 papers)
  6. Ning Jiang (177 papers)
  7. Haiying Wu (4 papers)

Summary

We haven't generated a summary for this paper yet.