VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration (2501.00794v1)
Abstract: We present VoiceRestore, a novel approach to restoring the quality of speech recordings using flow-matching Transformers trained in a self-supervised manner on synthetic data. Our method tackles a wide range of degradations frequently found in both short and long-form speech recordings, including background noise, reverberation, compression artifacts, and bandwidth limitations - all within a single, unified model. Leveraging conditional flow matching and classifier free guidance, the model learns to map degraded speech to high quality recordings without requiring paired clean and degraded datasets. We describe the training process, the conditional flow matching framework, and the model's architecture. We also demonstrate the model's generalization to real-world speech restoration tasks, including both short utterances and extended monologues or dialogues. Qualitative and quantitative evaluations show that our approach provides a flexible and effective solution for enhancing the quality of speech recordings across varying lengths and degradation types.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.