Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Partial Fine-Tuning: A Successor to Full Fine-Tuning for Vision Transformers (2312.15681v1)

Published 25 Dec 2023 in cs.CV and cs.AI

Abstract: Fine-tuning pre-trained foundation models has gained significant popularity in various research fields. Existing methods for fine-tuning can be roughly divided into two categories, namely Parameter-Efficient Fine-Tuning and High-Performance Fine-Tuning. The former aims at improving efficiency, while the latter focuses on enhancing performance. Beyond these methods, we demonstrate that Partial Fine-Tuning can be an innovative and promising direction capable of concurrently enhancing both efficiency and accuracy. We first validate eight manually-defined partial fine-tuning strategies across kinds of datasets and vision transformer architectures, and find that some partial fine-tuning strategies (e.g., ffn only or attention only) can achieve better performance with fewer tuned parameters than full fine-tuning, and selecting appropriate layers is critical to partial fine-tuning. Thus, we propose a novel fine-tuned angle metric to guide the selection of appropriate layers for partial fine-tuning, making it flexible to be adapted to various scenarios for more practicable partial fine-tuning. Additionally, we show that partial fine-tuning can serve as a new dimension for Model Soups, improving both the model performance and generalization with fewer tuned parameters. Comprehensive experiments on a wide range of datasets and models validate the great potential of partial fine-tuning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Theoretical analysis of auto rate-tuning by batch normalization. arXiv preprint arXiv:1812.03981, 2018.
  2. Layer rotation: a surprisingly simple indicator of generalization in deep networks? In ICML 2019 Workshop on Identifying and Understanding Deep Learning Phenomena, 2019.
  3. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  4. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  5. Efficient adaptation of large vision transformer via adapter re-composing. arXiv preprint arXiv:2310.06234, 2023.
  6. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  7. Fine-grained car detection for visual census estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, 2017.
  8. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
  9. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR, 2019.
  10. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  11. Angle-based search space shrinking for neural architecture search. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIX 16, pages 119–134. Springer, 2020.
  12. Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models. arXiv preprint arXiv:2304.01933, 2023.
  13. Lorahub: Efficient cross-task generalization via dynamic lora composition. arXiv preprint arXiv:2307.13269, 2023.
  14. Editing models with task arithmetic. arXiv preprint arXiv:2212.04089, 2022.
  15. Visual prompt tuning. In European Conference on Computer Vision, pages 709–727. Springer, 2022.
  16. Novel dataset for fine-grained image categorization: Stanford dogs. In Proc. CVPR workshop on fine-grained visual categorization (FGVC). Citeseer, 2011.
  17. Learning multiple layers of features from tiny images. 2009.
  18. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
  19. An exponential learning rate schedule for deep learning. arXiv preprint arXiv:1910.07454, 2019.
  20. As-mlp: An axial shifted mlp architecture for vision. arXiv preprint arXiv:2107.08391, 2021.
  21. Scaling & shifting your features: A new baseline for efficient model tuning. Advances in Neural Information Processing Systems, 35:109–123, 2022.
  22. Rethinking the bert-like pretraining for dna sequences. arXiv preprint arXiv:2310.07644, 2023.
  23. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
  24. A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022.
  25. Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE, 2008.
  26. Diverse weight averaging for out-of-distribution generalization. Advances in Neural Information Processing Systems, 35:10821–10836, 2022.
  27. Model ratatouille: Recycling diverse models for out-of-distribution generalization. 2023.
  28. Partial is better than all: revisiting fine-tuning strategy for few-shot learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 9594–9602, 2021.
  29. Boosting residual networks with group knowledge. arXiv preprint arXiv:2308.13772, 2023.
  30. Three things everyone should know about vision transformers. In European Conference on Computer Vision, pages 497–515. Springer, 2022.
  31. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 595–604, 2015.
  32. The caltech-ucsd birds-200-2011 dataset. 2011.
  33. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In International Conference on Machine Learning, pages 23965–23998. PMLR, 2022.
  34. Resolving interference when merging models. arXiv preprint arXiv:2306.01708, 2023.
  35. Revisiting training-free nas metrics: An efficient training-based method. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 4751–4760, 2023.
  36. Stimulative training of residual networks: A social psychology perspective of loafing. Advances in Neural Information Processing Systems, 35:3596–3608, 2022.
  37. Stimulative training++: Go beyond the performance limits of residual networks. arXiv preprint arXiv:2305.02507, 2023.
  38. Neural architecture search with random labels. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10907–10916, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Peng Ye (142 papers)
  2. Yongqi Huang (6 papers)
  3. Chongjun Tu (8 papers)
  4. Minglei Li (19 papers)
  5. Tao Chen (397 papers)
  6. Tong He (124 papers)
  7. Wanli Ouyang (358 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.