Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Contrastive Alignment of Vision to Language Through Parameter-Efficient Transfer Learning (2303.11866v1)

Published 21 Mar 2023 in cs.CV

Abstract: Contrastive vision-LLMs (e.g. CLIP) are typically created by updating all the parameters of a vision model and LLM through contrastive training. Can such models be created by a small number of parameter updates to an already-trained LLM and vision model? The literature describes techniques that can create vision-LLMs by updating a small number of parameters in a LLM, but these require already aligned visual representations and are non-contrastive, hence unusable for latency-sensitive applications such as neural search. We explore the feasibility and benefits of parameter-efficient contrastive vision-language alignment through transfer learning: creating a model such as CLIP by minimally updating an already-trained vision and LLM. We find that a minimal set of parameter updates ($<$7%) can achieve the same performance as full-model training, and updating specific components ($<$1% of parameters) can match 75% of full-model training. We describe a series of experiments: we show that existing knowledge is conserved more strongly in parameter-efficient training and that parameter-efficient scaling scales with model and dataset size. Where paired-image text data is scarce but strong multilingual LLMs exist (e.g. low resource languages), parameter-efficient training is even preferable to full-model training. Given a fixed compute budget, parameter-efficient training allows training larger models on the same hardware, achieving equivalent performance in less time. Parameter-efficient training hence constitutes an energy-efficient and effective training strategy for contrastive vision-LLMs that may be preferable to the full-model training paradigm for common use cases. Code and weights at https://github.com/codezakh/LilT.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Zaid Khan (16 papers)
  2. Yun Fu (131 papers)
Citations (10)