Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation (2406.04683v1)

Published 7 Jun 2024 in cs.SD and eess.AS

Abstract: Text-to-Audio (TTA) aims to generate audio that corresponds to the given text description, playing a crucial role in media production. The text descriptions in TTA datasets lack rich variations and diversity, resulting in a drop in TTA model performance when faced with complex text. To address this issue, we propose a method called Portable Plug-in Prompt Refiner, which utilizes rich knowledge about textual descriptions inherent in LLMs to effectively enhance the robustness of TTA acoustic models without altering the acoustic training set. Furthermore, a Chain-of-Thought that mimics human verification is introduced to enhance the accuracy of audio descriptions, thereby improving the accuracy of generated content in practical applications. The experiments show that our method achieves a state-of-the-art Inception Score (IS) of 8.72, surpassing AudioGen, AudioLDM and Tango.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (13)
  1. Shuchen Shi (14 papers)
  2. Ruibo Fu (54 papers)
  3. Zhengqi Wen (69 papers)
  4. Jianhua Tao (139 papers)
  5. Tao Wang (700 papers)
  6. Chunyu Qiang (21 papers)
  7. Yi Lu (145 papers)
  8. Xin Qi (36 papers)
  9. Xuefei Liu (24 papers)
  10. Yukun Liu (45 papers)
  11. Yongwei Li (12 papers)
  12. Zhiyong Wang (120 papers)
  13. Xiaopeng Wang (53 papers)