Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Partial Fine-Tuning: A Successor to Full Fine-Tuning for Vision Transformers (2312.15681v1)

Published 25 Dec 2023 in cs.CV and cs.AI

Abstract: Fine-tuning pre-trained foundation models has gained significant popularity in various research fields. Existing methods for fine-tuning can be roughly divided into two categories, namely Parameter-Efficient Fine-Tuning and High-Performance Fine-Tuning. The former aims at improving efficiency, while the latter focuses on enhancing performance. Beyond these methods, we demonstrate that Partial Fine-Tuning can be an innovative and promising direction capable of concurrently enhancing both efficiency and accuracy. We first validate eight manually-defined partial fine-tuning strategies across kinds of datasets and vision transformer architectures, and find that some partial fine-tuning strategies (e.g., ffn only or attention only) can achieve better performance with fewer tuned parameters than full fine-tuning, and selecting appropriate layers is critical to partial fine-tuning. Thus, we propose a novel fine-tuned angle metric to guide the selection of appropriate layers for partial fine-tuning, making it flexible to be adapted to various scenarios for more practicable partial fine-tuning. Additionally, we show that partial fine-tuning can serve as a new dimension for Model Soups, improving both the model performance and generalization with fewer tuned parameters. Comprehensive experiments on a wide range of datasets and models validate the great potential of partial fine-tuning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Theoretical analysis of auto rate-tuning by batch normalization. arXiv preprint arXiv:1812.03981, 2018.
  2. Layer rotation: a surprisingly simple indicator of generalization in deep networks? In ICML 2019 Workshop on Identifying and Understanding Deep Learning Phenomena, 2019.
  3. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  4. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  5. Efficient adaptation of large vision transformer via adapter re-composing. arXiv preprint arXiv:2310.06234, 2023.
  6. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  7. Fine-grained car detection for visual census estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, 2017.
  8. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
  9. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR, 2019.
  10. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  11. Angle-based search space shrinking for neural architecture search. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIX 16, pages 119–134. Springer, 2020.
  12. Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models. arXiv preprint arXiv:2304.01933, 2023.
  13. Lorahub: Efficient cross-task generalization via dynamic lora composition. arXiv preprint arXiv:2307.13269, 2023.
  14. Editing models with task arithmetic. arXiv preprint arXiv:2212.04089, 2022.
  15. Visual prompt tuning. In European Conference on Computer Vision, pages 709–727. Springer, 2022.
  16. Novel dataset for fine-grained image categorization: Stanford dogs. In Proc. CVPR workshop on fine-grained visual categorization (FGVC). Citeseer, 2011.
  17. Learning multiple layers of features from tiny images. 2009.
  18. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
  19. An exponential learning rate schedule for deep learning. arXiv preprint arXiv:1910.07454, 2019.
  20. As-mlp: An axial shifted mlp architecture for vision. arXiv preprint arXiv:2107.08391, 2021.
  21. Scaling & shifting your features: A new baseline for efficient model tuning. Advances in Neural Information Processing Systems, 35:109–123, 2022.
  22. Rethinking the bert-like pretraining for dna sequences. arXiv preprint arXiv:2310.07644, 2023.
  23. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
  24. A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022.
  25. Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE, 2008.
  26. Diverse weight averaging for out-of-distribution generalization. Advances in Neural Information Processing Systems, 35:10821–10836, 2022.
  27. Model ratatouille: Recycling diverse models for out-of-distribution generalization. 2023.
  28. Partial is better than all: revisiting fine-tuning strategy for few-shot learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 9594–9602, 2021.
  29. Boosting residual networks with group knowledge. arXiv preprint arXiv:2308.13772, 2023.
  30. Three things everyone should know about vision transformers. In European Conference on Computer Vision, pages 497–515. Springer, 2022.
  31. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 595–604, 2015.
  32. The caltech-ucsd birds-200-2011 dataset. 2011.
  33. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In International Conference on Machine Learning, pages 23965–23998. PMLR, 2022.
  34. Resolving interference when merging models. arXiv preprint arXiv:2306.01708, 2023.
  35. Revisiting training-free nas metrics: An efficient training-based method. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 4751–4760, 2023.
  36. Stimulative training of residual networks: A social psychology perspective of loafing. Advances in Neural Information Processing Systems, 35:3596–3608, 2022.
  37. Stimulative training++: Go beyond the performance limits of residual networks. arXiv preprint arXiv:2305.02507, 2023.
  38. Neural architecture search with random labels. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10907–10916, 2021.
Citations (4)

Summary

We haven't generated a summary for this paper yet.