Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
101 tokens/sec
Gemini 2.5 Pro Premium
50 tokens/sec
GPT-5 Medium
28 tokens/sec
GPT-5 High Premium
27 tokens/sec
GPT-4o
101 tokens/sec
DeepSeek R1 via Azure Premium
90 tokens/sec
GPT OSS 120B via Groq Premium
515 tokens/sec
Kimi K2 via Groq Premium
220 tokens/sec
2000 character limit reached

Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling (2505.19931v2)

Published 26 May 2025 in eess.AS and cs.SD

Abstract: Flow-matching-based text-to-speech (TTS) models, such as Voicebox, E2 TTS, and F5-TTS, have attracted significant attention in recent years. These models require multiple sampling steps to reconstruct speech from noise, making inference speed a key challenge. Reducing the number of sampling steps can greatly improve inference efficiency. To this end, we introduce Fast F5-TTS, a training-free approach to accelerate the inference of flow-matching-based TTS models. By inspecting the sampling trajectory of F5-TTS, we identify redundant steps and propose Empirically Pruned Step Sampling (EPSS), a non-uniform time-step sampling strategy that effectively reduces the number of sampling steps. Our approach achieves a 7-step generation with an inference RTF of 0.030 on an NVIDIA RTX 3090 GPU, making it 4 times faster than the original F5-TTS while maintaining comparable performance. Furthermore, EPSS performs well on E2 TTS models, demonstrating its strong generalization ability.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.