PALM: Few-Shot Prompt Learning for Audio Language Models (2409.19806v1)

Published 29 Sep 2024 in cs.SD, cs.AI, and eess.AS

Abstract: Audio-LLMs (ALMs) have recently achieved remarkable success in zero-shot audio recognition tasks, which match features of audio waveforms with class-specific text prompt features, inspired by advancements in Vision-LLMs (VLMs). Given the sensitivity of zero-shot performance to the choice of hand-crafted text prompts, many prompt learning techniques have been developed for VLMs. We explore the efficacy of these approaches in ALMs and propose a novel method, Prompt Learning in Audio LLMs (PALM), which optimizes the feature space of the text encoder branch. Unlike existing methods that work in the input space, our approach results in greater training efficiency. We demonstrate the effectiveness of our approach on 11 audio recognition datasets, encompassing a variety of speech-processing tasks, and compare the results with three baselines in a few-shot learning setup. Our method is either on par with or outperforms other approaches while being computationally less demanding. Code is available at https://asif-hanif.github.io/palm/

Citations (1)

View on Semantic Scholar

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

PALM: Few-Shot Prompt Learning for Audio Language Models (2409.19806v1)

Collections

Summary

Paper Prompts

Follow-up Questions

Authors (4)

GitHub

Don't miss out on important new AI/ML research

PALM: Few-Shot Prompt Learning for Audio Language Models (2409.19806v1)

Collections

Summary

Paper Prompts

Follow-up Questions

Related Papers

Authors (4)

GitHub

Don't miss out on important new AI/ML research