Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit (2109.06912v1)

Published 14 Sep 2021 in eess.AS, cs.CL, and cs.SD

Abstract: This paper presents fairseq S2, a fairseq extension for speech synthesis. We implement a number of autoregressive (AR) and non-AR text-to-speech models, and their multi-speaker variants. To enable training speech synthesis models with less curated data, a number of preprocessing tools are built and their importance is shown empirically. To facilitate faster iteration of development and analysis, a suite of automatic metrics is included. Apart from the features added specifically for this extension, fairseq S2 also benefits from the scalability offered by fairseq and can be easily integrated with other state-of-the-art systems provided in this framework. The code, documentation, and pre-trained models are available at https://github.com/pytorch/fairseq/tree/master/examples/speech_synthesis.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Changhan Wang (46 papers)
  2. Wei-Ning Hsu (76 papers)
  3. Yossi Adi (96 papers)
  4. Adam Polyak (29 papers)
  5. Ann Lee (29 papers)
  6. Peng-Jen Chen (26 papers)
  7. Jiatao Gu (84 papers)
  8. Juan Pino (51 papers)
Citations (30)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com