Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation (2203.15643v2)

Published 29 Mar 2022 in cs.SD, cs.CL, cs.LG, cs.NE, and eess.AS

Abstract: Several solutions for lightweight TTS have shown promising results. Still, they either rely on a hand-crafted design that reaches non-optimum size or use a neural architecture search but often suffer training costs. We present Nix-TTS, a lightweight TTS achieved via knowledge distillation to a high-quality yet large-sized, non-autoregressive, and end-to-end (vocoder-free) TTS teacher model. Specifically, we offer module-wise distillation, enabling flexible and independent distillation to the encoder and decoder module. The resulting Nix-TTS inherited the advantageous properties of being non-autoregressive and end-to-end from the teacher, yet significantly smaller in size, with only 5.23M parameters or up to 89.34% reduction of the teacher model; it also achieves over 3.04x and 8.36x inference speedup on Intel-i7 CPU and Raspberry Pi 3B respectively and still retains a fair voice naturalness and intelligibility compared to the teacher model. We provide pretrained models and audio samples of Nix-TTS.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Rendi Chevi (7 papers)
  2. Radityo Eko Prasojo (13 papers)
  3. Alham Fikri Aji (94 papers)
  4. Andros Tjandra (39 papers)
  5. Sakriani Sakti (41 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.