Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FeatherTTS: Robust and Efficient attention based Neural TTS (2011.00935v1)

Published 2 Nov 2020 in eess.AS and cs.SD

Abstract: Attention based neural TTS is elegant speech synthesis pipeline and has shown a powerful ability to generate natural speech. However, it is still not robust enough to meet the stability requirements for industrial products. Besides, it suffers from slow inference speed owning to the autoregressive generation process. In this work, we propose FeatherTTS, a robust and efficient attention-based neural TTS system. Firstly, we propose a novel Gaussian attention which utilizes interpretability of Gaussian attention and the strict monotonic property in TTS. By this method, we replace the commonly used stop token prediction architecture with attentive stop prediction. Secondly, we apply block sparsity on the autoregressive decoder to speed up speech synthesis. The experimental results show that our proposed FeatherTTS not only nearly eliminates the problem of word skipping, repeating in particularly hard texts and keep the naturalness of generated speech, but also speeds up acoustic feature generation by 3.5 times over Tacotron. Overall, the proposed FeatherTTS can be $35$x faster than real-time on a single CPU.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Qiao Tian (27 papers)
  2. Zewang Zhang (9 papers)
  3. Chao Liu (358 papers)
  4. Heng Lu (41 papers)
  5. Linghui Chen (2 papers)
  6. Bin Wei (25 papers)
  7. Pujiang He (3 papers)
  8. Shan Liu (94 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.