An Empirical Study of Parameter Efficient Fine-tuning on Vision-Language Pre-train Model (2403.08433v2)
Abstract: Recent studies applied Parameter Efficient Fine-Tuning techniques (PEFTs) to efficiently narrow the performance gap between pre-training and downstream. There are two important factors for various PEFTs, namely, the accessible data size and fine-tunable parameter size. A natural expectation for PEFTs is that the performance of various PEFTs is positively related to the data size and fine-tunable parameter size. However, according to the evaluation of five PEFTs on two downstream vision-language (VL) tasks, we find that such an intuition holds only if the downstream data and task are not consistent with pre-training. For downstream fine-tuning consistent with pre-training, data size no longer affects the performance, while the influence of fine-tunable parameter size is not monotonous. We believe such an observation could guide the choice of training strategy for various PEFTs.
- “OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework,” in Proc. of ICML, 2022, pp. 23318–23340.
- “Learning Transferable Visual Models From Natural Language Supervision,” in Proc. of ICML, 2021, pp. 8748–8763.
- “Align before Fuse: Vision and Language Representation Learning with Momentum Distillation,” in Proc. of NeurIPS, 2021, pp. 9694–9705.
- “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in Proc. of ICML, 2022, pp. 12888–12900.
- “The power of scale for parameter-efficient prompt tuning,” arXiv preprint arXiv:2104.08691, 2021.
- “Prefix-Tuning: Optimizing Continuous Prompts for Generation,” in Proc. of ACL, 2021, pp. 4582–4597.
- “Parameter-efficient transfer learning for NLP,” in Proc. of ICML, 2019, pp. 2790–2799.
- “Towards a Unified View of Parameter-Efficient Transfer Learning,” in Proc. of ICLR, 2022.
- “LoRA: Low-Rank Adaptation of Large Language Models,” in Proc. of ICLR, 2022.
- “Learning to Prompt for Vision-Language Models,” Int. J. Comput. Vis., pp. 2337–2348, 2022.
- “Maple: Multi-modal prompt learning,” in Proc. of CVPR, 2023, pp. 19113–19122.
- “LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention,” CoRR, vol. abs/2303.16199, 2023.
- “mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality,” CoRR, vol. abs/2304.14178, 2023.
- “Revisiting Parameter-Efficient Tuning: Are We Really There Yet?,” in Proc. of EMNLP, 2022, pp. 2612–2626.
- “VL-ADAPTER: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks,” in Proc. of CVPR, 2022, pp. 5217–5227.
- “Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models,” CoRR, vol. abs/2306.02080, 2023.
- “Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning,” in Proc. of ACL, 2021, pp. 7319–7328.
- “mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections,” in Proc. of EMNLP, 2022, pp. 7241–7259.
- “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proc. of NAACL, 2019, pp. 4171–4186.
- “Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering,” Int. J. Comput. Vis., pp. 398–414, 2019.
- “Microsoft COCO Captions: Data Collection and Evaluation Server,” CoRR, vol. abs/1504.00325, 2015.
- “CIDEr: Consensus-based image description evaluation,” in Proc. of CVPR, 2015, pp. 4566–4575.
- “PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation,” in Proc. of EMNLP, 2020, pp. 8681–8691.
- Yuxin Tian (26 papers)
- Mouxing Yang (9 papers)
- Yunfan Li (26 papers)
- Dayiheng Liu (75 papers)
- Xingzhang Ren (13 papers)
- Xi Peng (115 papers)
- Jiancheng Lv (99 papers)