Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 94 tok/s
Gemini 2.5 Pro 57 tok/s Pro
GPT-5 Medium 28 tok/s
GPT-5 High 38 tok/s Pro
GPT-4o 100 tok/s
GPT OSS 120B 461 tok/s Pro
Kimi K2 208 tok/s Pro
2000 character limit reached

Parameter-Efficient Transfer Learning for Audio-Visual-Language Tasks (2308.14274v1)

Published 28 Aug 2023 in cs.MM

Abstract: The pretrain-then-finetune paradigm has been widely used in various unimodal and multimodal tasks. However, finetuning all the parameters of a pre-trained model becomes prohibitive as the model size grows exponentially. To address this issue, the adapter mechanism that freezes the pre-trained model and only finetunes a few extra parameters is introduced and delivers promising results. Most studies on adapter architectures are dedicated to unimodal or bimodal tasks, while the adapter architectures for trimodal tasks have not been investigated yet. This paper introduces a novel Long Short-Term Trimodal Adapter (LSTTA) approach for video understanding tasks involving audio, visual, and language modalities. Based on the pre-trained from the three modalities, the designed adapter module is inserted between the sequential blocks to model the dense interactions across the three modalities. Specifically, LSTTA consists of two types of complementary adapter modules, namely the long-term semantic filtering module and the short-term semantic interaction module. The long-term semantic filtering aims to characterize the temporal importance of the video frames and the short-term semantic interaction module models local interactions within short periods. Compared to previous state-of-the-art trimodal learning methods pre-trained on a large-scale trimodal corpus, LSTTA is more flexible and can inherit any powerful unimodal or bimodal models. Experimental results on four typical trimodal learning tasks show the effectiveness of LSTTA over existing state-of-the-art methods.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.