Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Empirical Study of Parameter Efficient Fine-tuning on Vision-Language Pre-train Model (2403.08433v2)

Published 13 Mar 2024 in cs.CV

Abstract: Recent studies applied Parameter Efficient Fine-Tuning techniques (PEFTs) to efficiently narrow the performance gap between pre-training and downstream. There are two important factors for various PEFTs, namely, the accessible data size and fine-tunable parameter size. A natural expectation for PEFTs is that the performance of various PEFTs is positively related to the data size and fine-tunable parameter size. However, according to the evaluation of five PEFTs on two downstream vision-language (VL) tasks, we find that such an intuition holds only if the downstream data and task are not consistent with pre-training. For downstream fine-tuning consistent with pre-training, data size no longer affects the performance, while the influence of fine-tunable parameter size is not monotonous. We believe such an observation could guide the choice of training strategy for various PEFTs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. “OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework,” in Proc. of ICML, 2022, pp. 23318–23340.
  2. “Learning Transferable Visual Models From Natural Language Supervision,” in Proc. of ICML, 2021, pp. 8748–8763.
  3. “Align before Fuse: Vision and Language Representation Learning with Momentum Distillation,” in Proc. of NeurIPS, 2021, pp. 9694–9705.
  4. “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in Proc. of ICML, 2022, pp. 12888–12900.
  5. “The power of scale for parameter-efficient prompt tuning,” arXiv preprint arXiv:2104.08691, 2021.
  6. “Prefix-Tuning: Optimizing Continuous Prompts for Generation,” in Proc. of ACL, 2021, pp. 4582–4597.
  7. “Parameter-efficient transfer learning for NLP,” in Proc. of ICML, 2019, pp. 2790–2799.
  8. “Towards a Unified View of Parameter-Efficient Transfer Learning,” in Proc. of ICLR, 2022.
  9. “LoRA: Low-Rank Adaptation of Large Language Models,” in Proc. of ICLR, 2022.
  10. “Learning to Prompt for Vision-Language Models,” Int. J. Comput. Vis., pp. 2337–2348, 2022.
  11. “Maple: Multi-modal prompt learning,” in Proc. of CVPR, 2023, pp. 19113–19122.
  12. “LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention,” CoRR, vol. abs/2303.16199, 2023.
  13. “mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality,” CoRR, vol. abs/2304.14178, 2023.
  14. “Revisiting Parameter-Efficient Tuning: Are We Really There Yet?,” in Proc. of EMNLP, 2022, pp. 2612–2626.
  15. “VL-ADAPTER: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks,” in Proc. of CVPR, 2022, pp. 5217–5227.
  16. “Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models,” CoRR, vol. abs/2306.02080, 2023.
  17. “Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning,” in Proc. of ACL, 2021, pp. 7319–7328.
  18. “mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections,” in Proc. of EMNLP, 2022, pp. 7241–7259.
  19. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proc. of NAACL, 2019, pp. 4171–4186.
  20. “Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering,” Int. J. Comput. Vis., pp. 398–414, 2019.
  21. “Microsoft COCO Captions: Data Collection and Evaluation Server,” CoRR, vol. abs/1504.00325, 2015.
  22. “CIDEr: Consensus-based image description evaluation,” in Proc. of CVPR, 2015, pp. 4566–4575.
  23. “PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation,” in Proc. of EMNLP, 2020, pp. 8681–8691.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yuxin Tian (26 papers)
  2. Mouxing Yang (9 papers)
  3. Yunfan Li (26 papers)
  4. Dayiheng Liu (75 papers)
  5. Xingzhang Ren (13 papers)
  6. Xi Peng (115 papers)
  7. Jiancheng Lv (99 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com