Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Synth-AC: Enhancing Audio Captioning with Synthetic Supervision (2309.09705v1)

Published 18 Sep 2023 in cs.SD and eess.AS

Abstract: Data-driven approaches hold promise for audio captioning. However, the development of audio captioning methods can be biased due to the limited availability and quality of text-audio data. This paper proposes a SynthAC framework, which leverages recent advances in audio generative models and commonly available text corpus to create synthetic text-audio pairs, thereby enhancing text-audio representation. Specifically, the text-to-audio generation model, i.e., AudioLDM, is used to generate synthetic audio signals with captions from an image captioning dataset. Our SynthAC expands the availability of well-annotated captions from the text-vision domain to audio captioning, thus enhancing text-audio representation by learning relations within synthetic text-audio pairs. Experiments demonstrate that our SynthAC framework can benefit audio captioning models by incorporating well-annotated text corpus from the text-vision domain, offering a promising solution to the challenge caused by data scarcity. Furthermore, SynthAC can be easily adapted to various state-of-the-art methods, leading to substantial performance improvements.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Feiyang Xiao (17 papers)
  2. Qiaoxi Zhu (17 papers)
  3. Jian Guan (65 papers)
  4. Xubo Liu (66 papers)
  5. Haohe Liu (59 papers)
  6. Kejia Zhang (15 papers)
  7. Wenwu Wang (148 papers)
Citations (2)