Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Audio Prompt Tuning for Universal Sound Separation (2311.18399v1)

Published 30 Nov 2023 in eess.AS and cs.SD

Abstract: Universal sound separation (USS) is a task to separate arbitrary sounds from an audio mixture. Existing USS systems are capable of separating arbitrary sources, given a few examples of the target sources as queries. However, separating arbitrary sounds with a single system is challenging, and the robustness is not always guaranteed. In this work, we propose audio prompt tuning (APT), a simple yet effective approach to enhance existing USS systems. Specifically, APT improves the separation performance of specific sources through training a small number of prompt parameters with limited audio samples, while maintaining the generalization of the USS model by keeping its parameters frozen. We evaluate the proposed method on MUSDB18 and ESC-50 datasets. Compared with the baseline model, APT can improve the signal-to-distortion ratio performance by 0.67 dB and 2.06 dB using the full training set of two datasets. Moreover, APT with only 5 audio samples even outperforms the baseline systems utilizing full training data on the ESC-50 dataset, indicating the great potential of few-shot APT.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yuzhuo Liu (4 papers)
  2. Xubo Liu (66 papers)
  3. Yan Zhao (120 papers)
  4. Yuanyuan Wang (93 papers)
  5. Rui Xia (53 papers)
  6. Pingchuan Tain (1 paper)
  7. Yuxuan Wang (239 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.