Partial Fine-Tuning: A Successor to Full Fine-Tuning for Vision Transformers (2312.15681v1)
Abstract: Fine-tuning pre-trained foundation models has gained significant popularity in various research fields. Existing methods for fine-tuning can be roughly divided into two categories, namely Parameter-Efficient Fine-Tuning and High-Performance Fine-Tuning. The former aims at improving efficiency, while the latter focuses on enhancing performance. Beyond these methods, we demonstrate that Partial Fine-Tuning can be an innovative and promising direction capable of concurrently enhancing both efficiency and accuracy. We first validate eight manually-defined partial fine-tuning strategies across kinds of datasets and vision transformer architectures, and find that some partial fine-tuning strategies (e.g., ffn only or attention only) can achieve better performance with fewer tuned parameters than full fine-tuning, and selecting appropriate layers is critical to partial fine-tuning. Thus, we propose a novel fine-tuned angle metric to guide the selection of appropriate layers for partial fine-tuning, making it flexible to be adapted to various scenarios for more practicable partial fine-tuning. Additionally, we show that partial fine-tuning can serve as a new dimension for Model Soups, improving both the model performance and generalization with fewer tuned parameters. Comprehensive experiments on a wide range of datasets and models validate the great potential of partial fine-tuning.
- Theoretical analysis of auto rate-tuning by batch normalization. arXiv preprint arXiv:1812.03981, 2018.
- Layer rotation: a surprisingly simple indicator of generalization in deep networks? In ICML 2019 Workshop on Identifying and Understanding Deep Learning Phenomena, 2019.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Efficient adaptation of large vision transformer via adapter re-composing. arXiv preprint arXiv:2310.06234, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Fine-grained car detection for visual census estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, 2017.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
- Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR, 2019.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Angle-based search space shrinking for neural architecture search. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIX 16, pages 119–134. Springer, 2020.
- Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models. arXiv preprint arXiv:2304.01933, 2023.
- Lorahub: Efficient cross-task generalization via dynamic lora composition. arXiv preprint arXiv:2307.13269, 2023.
- Editing models with task arithmetic. arXiv preprint arXiv:2212.04089, 2022.
- Visual prompt tuning. In European Conference on Computer Vision, pages 709–727. Springer, 2022.
- Novel dataset for fine-grained image categorization: Stanford dogs. In Proc. CVPR workshop on fine-grained visual categorization (FGVC). Citeseer, 2011.
- Learning multiple layers of features from tiny images. 2009.
- Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
- An exponential learning rate schedule for deep learning. arXiv preprint arXiv:1910.07454, 2019.
- As-mlp: An axial shifted mlp architecture for vision. arXiv preprint arXiv:2107.08391, 2021.
- Scaling & shifting your features: A new baseline for efficient model tuning. Advances in Neural Information Processing Systems, 35:109–123, 2022.
- Rethinking the bert-like pretraining for dna sequences. arXiv preprint arXiv:2310.07644, 2023.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
- A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022.
- Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE, 2008.
- Diverse weight averaging for out-of-distribution generalization. Advances in Neural Information Processing Systems, 35:10821–10836, 2022.
- Model ratatouille: Recycling diverse models for out-of-distribution generalization. 2023.
- Partial is better than all: revisiting fine-tuning strategy for few-shot learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 9594–9602, 2021.
- Boosting residual networks with group knowledge. arXiv preprint arXiv:2308.13772, 2023.
- Three things everyone should know about vision transformers. In European Conference on Computer Vision, pages 497–515. Springer, 2022.
- Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 595–604, 2015.
- The caltech-ucsd birds-200-2011 dataset. 2011.
- Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In International Conference on Machine Learning, pages 23965–23998. PMLR, 2022.
- Resolving interference when merging models. arXiv preprint arXiv:2306.01708, 2023.
- Revisiting training-free nas metrics: An efficient training-based method. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 4751–4760, 2023.
- Stimulative training of residual networks: A social psychology perspective of loafing. Advances in Neural Information Processing Systems, 35:3596–3608, 2022.
- Stimulative training++: Go beyond the performance limits of residual networks. arXiv preprint arXiv:2305.02507, 2023.
- Neural architecture search with random labels. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10907–10916, 2021.
- Peng Ye (142 papers)
- Yongqi Huang (6 papers)
- Chongjun Tu (8 papers)
- Minglei Li (19 papers)
- Tao Chen (397 papers)
- Tong He (124 papers)
- Wanli Ouyang (358 papers)