Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Empirical Study of Parameter Efficient Fine-tuning on Vision-Language Pre-train Model (2403.08433v2)

Published 13 Mar 2024 in cs.CV

Abstract: Recent studies applied Parameter Efficient Fine-Tuning techniques (PEFTs) to efficiently narrow the performance gap between pre-training and downstream. There are two important factors for various PEFTs, namely, the accessible data size and fine-tunable parameter size. A natural expectation for PEFTs is that the performance of various PEFTs is positively related to the data size and fine-tunable parameter size. However, according to the evaluation of five PEFTs on two downstream vision-language (VL) tasks, we find that such an intuition holds only if the downstream data and task are not consistent with pre-training. For downstream fine-tuning consistent with pre-training, data size no longer affects the performance, while the influence of fine-tunable parameter size is not monotonous. We believe such an observation could guide the choice of training strategy for various PEFTs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. “OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework,” in Proc. of ICML, 2022, pp. 23318–23340.
  2. “Learning Transferable Visual Models From Natural Language Supervision,” in Proc. of ICML, 2021, pp. 8748–8763.
  3. “Align before Fuse: Vision and Language Representation Learning with Momentum Distillation,” in Proc. of NeurIPS, 2021, pp. 9694–9705.
  4. “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in Proc. of ICML, 2022, pp. 12888–12900.
  5. “The power of scale for parameter-efficient prompt tuning,” arXiv preprint arXiv:2104.08691, 2021.
  6. “Prefix-Tuning: Optimizing Continuous Prompts for Generation,” in Proc. of ACL, 2021, pp. 4582–4597.
  7. “Parameter-efficient transfer learning for NLP,” in Proc. of ICML, 2019, pp. 2790–2799.
  8. “Towards a Unified View of Parameter-Efficient Transfer Learning,” in Proc. of ICLR, 2022.
  9. “LoRA: Low-Rank Adaptation of Large Language Models,” in Proc. of ICLR, 2022.
  10. “Learning to Prompt for Vision-Language Models,” Int. J. Comput. Vis., pp. 2337–2348, 2022.
  11. “Maple: Multi-modal prompt learning,” in Proc. of CVPR, 2023, pp. 19113–19122.
  12. “LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention,” CoRR, vol. abs/2303.16199, 2023.
  13. “mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality,” CoRR, vol. abs/2304.14178, 2023.
  14. “Revisiting Parameter-Efficient Tuning: Are We Really There Yet?,” in Proc. of EMNLP, 2022, pp. 2612–2626.
  15. “VL-ADAPTER: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks,” in Proc. of CVPR, 2022, pp. 5217–5227.
  16. “Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models,” CoRR, vol. abs/2306.02080, 2023.
  17. “Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning,” in Proc. of ACL, 2021, pp. 7319–7328.
  18. “mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections,” in Proc. of EMNLP, 2022, pp. 7241–7259.
  19. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proc. of NAACL, 2019, pp. 4171–4186.
  20. “Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering,” Int. J. Comput. Vis., pp. 398–414, 2019.
  21. “Microsoft COCO Captions: Data Collection and Evaluation Server,” CoRR, vol. abs/1504.00325, 2015.
  22. “CIDEr: Consensus-based image description evaluation,” in Proc. of CVPR, 2015, pp. 4566–4575.
  23. “PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation,” in Proc. of EMNLP, 2020, pp. 8681–8691.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com