Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

High-quality Speech Synthesis Using Super-resolution Mel-Spectrogram (1912.01167v1)

Published 3 Dec 2019 in eess.AS and cs.SD

Abstract: In speech synthesis and speech enhancement systems, melspectrograms need to be precise in acoustic representations. However, the generated spectrograms are over-smooth, that could not produce high quality synthesized speech. Inspired by image-to-image translation, we address this problem by using a learning-based post filter combining Pix2PixHD and ResUnet to reconstruct the mel-spectrograms together with super-resolution. From the resulting super-resolution spectrogram networks, we can generate enhanced spectrograms to produce high quality synthesized speech. Our proposed model achieves improved mean opinion scores (MOS) of 3.71 and 4.01 over baseline results of 3.29 and 3.84, while using vocoder Griffin-Lim and WaveNet, respectively.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Leyuan Sheng (2 papers)
  2. Dong-Yan Huang (2 papers)
  3. Evgeniy N. Pavlovskiy (2 papers)
Citations (11)

Summary

We haven't generated a summary for this paper yet.