Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cross-Modal Adapter for Text-Video Retrieval (2211.09623v1)

Published 17 Nov 2022 in cs.CV, cs.AI, and cs.CL

Abstract: Text-video retrieval is an important multi-modal learning task, where the goal is to retrieve the most relevant video for a given text query. Recently, pre-trained models, e.g., CLIP, show great potential on this task. However, as pre-trained models are scaling up, fully fine-tuning them on text-video retrieval datasets has a high risk of overfitting. Moreover, in practice, it would be costly to train and store a large model for each task. To overcome the above issues, we present a novel $\textbf{Cross-Modal Adapter}$ for parameter-efficient fine-tuning. Inspired by adapter-based methods, we adjust the pre-trained model with a few parameterization layers. However, there are two notable differences. First, our method is designed for the multi-modal domain. Secondly, it allows early cross-modal interactions between CLIP's two encoders. Although surprisingly simple, our approach has three notable benefits: (1) reduces $\textbf{99.6}\%$ of fine-tuned parameters, and alleviates the problem of overfitting, (2) saves approximately 30% of training time, and (3) allows all the pre-trained parameters to be fixed, enabling the pre-trained model to be shared across datasets. Extensive experiments demonstrate that, without bells and whistles, it achieves superior or comparable performance compared to fully fine-tuned methods on MSR-VTT, MSVD, VATEX, ActivityNet, and DiDeMo datasets. The code will be available at \url{https://github.com/LeapLabTHU/Cross-Modal-Adapter}.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Haojun Jiang (13 papers)
  2. Jianke Zhang (6 papers)
  3. Rui Huang (128 papers)
  4. Chunjiang Ge (11 papers)
  5. Zanlin Ni (11 papers)
  6. Jiwen Lu (192 papers)
  7. Jie Zhou (687 papers)
  8. Shiji Song (103 papers)
  9. Gao Huang (178 papers)
Citations (32)
Github Logo Streamline Icon: https://streamlinehq.com