Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion (2505.22106v1)

Published 28 May 2025 in cs.SD, cs.AI, and eess.AS

Abstract: Diffusion models have significantly improved the quality and diversity of audio generation but are hindered by slow inference speed. Rectified flow enhances inference speed by learning straight-line ordinary differential equation (ODE) paths. However, this approach requires training a flow-matching model from scratch and tends to perform suboptimally, or even poorly, at low step counts. To address the limitations of rectified flow while leveraging the advantages of advanced pre-trained diffusion models, this study integrates pre-trained models with the rectified diffusion method to improve the efficiency of text-to-audio (TTA) generation. Specifically, we propose AudioTurbo, which learns first-order ODE paths from deterministic noise sample pairs generated by a pre-trained TTA model. Experiments on the AudioCaps dataset demonstrate that our model, with only 10 sampling steps, outperforms prior models and reduces inference to 3 steps compared to a flow-matching-based acceleration model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Junqi Zhao (8 papers)
  2. Jinzheng Zhao (18 papers)
  3. Haohe Liu (59 papers)
  4. Yun Chen (134 papers)
  5. Lu Han (38 papers)
  6. Xubo Liu (66 papers)
  7. Mark Plumbley (5 papers)
  8. Wenwu Wang (148 papers)

Summary

We haven't generated a summary for this paper yet.