Papers
Topics
Authors
Recent
Search
2000 character limit reached

Feedback-driven Retrieval-augmented Audio Generation with Large Audio Language Models

Published 2 Nov 2025 in cs.SD | (2511.01091v1)

Abstract: We propose a general feedback-driven retrieval-augmented generation (RAG) approach that leverages Large Audio LLMs (LALMs) to address the missing or imperfect synthesis of specific sound events in text-to-audio (TTA) generation. Unlike previous RAG-based TTA methods that typically train specialized models from scratch, we utilize LALMs to analyze audio generation outputs, retrieve concepts that pre-trained models struggle to generate from an external database, and incorporate the retrieved information into the generation process. Experimental results show that our method not only enhances the ability of LALMs to identify missing sound events but also delivers improvements across different models, outperforming existing RAG-specialized approaches.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.