Dice Question Streamline Icon: https://streamlinehq.com

Prompting Strategy for CLAP-based Audio Quality Assessment

Determine an effective prompting strategy and computational setup for using CLAP (Contrastive Language-Audio Pretraining), an audio–language model with joint audio–text embeddings, to perform audio quality assessment from audio inputs and quality-related text prompts.

Information Square Streamline Icon: https://streamlinehq.com

Background

Audio-LLMs such as CLAP are pretrained on large collections of audio–text pairs and can compute audio–text similarity in a shared embedding space. PAM leverages this capability by comparing an audio sample against opposing quality prompts (e.g., "the sound is clear and clean" vs. "the sound is noisy and with artifacts").

While the paper proposes a two-prompt antonym strategy and shows it improves correlation with human judgments over a naive single-prompt approach, the authors explicitly note that selecting the prompting strategy and setup to reliably elicit audio quality information from CLAP is unresolved, motivating further research into prompt design and inference configurations for audio quality assessment.

References

Determining the prompting strategy and setup to use CLAP for audio quality assessment is still an open question.

PAM: Prompting Audio-Language Models for Audio Quality Assessment (2402.00282 - Deshmukh et al., 1 Feb 2024) in Section 2.1 (Audio Quality Assessment)