PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation (2406.04683v1)
Abstract: Text-to-Audio (TTA) aims to generate audio that corresponds to the given text description, playing a crucial role in media production. The text descriptions in TTA datasets lack rich variations and diversity, resulting in a drop in TTA model performance when faced with complex text. To address this issue, we propose a method called Portable Plug-in Prompt Refiner, which utilizes rich knowledge about textual descriptions inherent in LLMs to effectively enhance the robustness of TTA acoustic models without altering the acoustic training set. Furthermore, a Chain-of-Thought that mimics human verification is introduced to enhance the accuracy of audio descriptions, thereby improving the accuracy of generated content in practical applications. The experiments show that our method achieves a state-of-the-art Inception Score (IS) of 8.72, surpassing AudioGen, AudioLDM and Tango.
- Shuchen Shi (14 papers)
- Ruibo Fu (54 papers)
- Zhengqi Wen (69 papers)
- Jianhua Tao (139 papers)
- Tao Wang (700 papers)
- Chunyu Qiang (21 papers)
- Yi Lu (145 papers)
- Xin Qi (36 papers)
- Xuefei Liu (24 papers)
- Yukun Liu (45 papers)
- Yongwei Li (12 papers)
- Zhiyong Wang (120 papers)
- Xiaopeng Wang (53 papers)