Papers
Topics
Authors
Recent
Search
2000 character limit reached

MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning

Published 12 Feb 2024 in cs.SD and eess.AS | (2402.07485v5)

Abstract: In the realm of audio-language pre-training (ALP), the challenge of achieving cross-modal alignment is significant. Moreover, the integration of audio inputs with diverse distributions and task variations poses challenges in developing generic audio-LLMs. In this study, we present MINT, a novel ALP framework boosting audio-LLMs through multi-target pre-training and instruction tuning. MINT leverages the strength of frozen pre-trained audio encoders and LLMs (LLM) to improve audio-language pre-training, enabling effective transferablility to both audio-text understanding and generation tasks. To address the modality gap, we introduce Bridge-Net, a trainable module that enhances cross-modality alignment and the model's ability to follow instructions for a variety of audio-text tasks. Bridge-Net is pivotal within MINT, initially enhancing audio-language representation learning through a multi-target pre-training approach. Subsequently, Bridge-Net further boosts audio-to-language generative learning by integrating a frozen LLM with instruction tuning. This integration empowers MINT to extract features in a flexible and effective manner, specifically tailored to the provided instructions for diverse tasks. Experimental results demonstrate that MINT attains superior performance across various audio-language understanding and generation tasks, highlighting its robust generalization capabilities even in zero-shot scenarios.

Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.