Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
GPT-4o
Gemini 2.5 Pro Pro
o3 Pro
GPT-4.1 Pro
DeepSeek R1 via Azure Pro
2000 character limit reached

One Step Learning, One Step Review (2401.10962v2)

Published 19 Jan 2024 in cs.CV and cs.LG

Abstract: Visual fine-tuning has garnered significant attention with the rise of pre-trained vision models. The current prevailing method, full fine-tuning, suffers from the issue of knowledge forgetting as it focuses solely on fitting the downstream training set. In this paper, we propose a novel weight rollback-based fine-tuning method called OLOR (One step Learning, One step Review). OLOR combines fine-tuning with optimizers, incorporating a weight rollback term into the weight update term at each step. This ensures consistency in the weight range of upstream and downstream models, effectively mitigating knowledge forgetting and enhancing fine-tuning performance. In addition, a layer-wise penalty is presented to employ penalty decay and the diversified decay rate to adjust the weight rollback levels of layers for adapting varying downstream tasks. Through extensive experiments on various tasks such as image classification, object detection, semantic segmentation, and instance segmentation, we demonstrate the general applicability and state-of-the-art performance of our proposed OLOR. Code is available at https://github.com/rainbow-xiao/OLOR-AAAI-2024.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Beit: Bert pre-training of image transformers. ArXiv Preprint arXiv:2106.08254.
  2. End-to-end object detection with transformers. In European Conference on Computer Vision, 213–229. Springer.
  3. A continual learning survey: Defying forgetting in classification tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7): 3366–3385.
  4. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–255. Ieee.
  5. An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv Preprint arXiv:2010.11929.
  6. Eva-02: A visual representation for neon genesis. ArXiv Preprint arXiv:2303.11331.
  7. Guan, L. 2023. Weight Prediction Boosts the Convergence of AdamW. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, 329–340. Springer.
  8. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16000–16009.
  9. Visual prompt tuning. In European Conference on Computer Vision, 709–727. Springer.
  10. Improving generalization performance by switching from adam to sgd. ArXiv Preprint arXiv:1712.07628.
  11. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13): 3521–3526.
  12. 3D object representations for fine-grained categorization. In Proceedings of the IEEE International Conference on Computer Vision Workshops, 554–561.
  13. Learning multiple layers of features from tiny images.
  14. How to fine-tune vision models with sgd. ArXiv Preprint arXiv:2211.09359.
  15. Deeper, broader and artier domain generalization. In Proceedings of the IEEE International Conference on Computer Vision, 5542–5550.
  16. Lsdir: A large scale dataset for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1775–1787.
  17. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2117–2125.
  18. Generative feature replay for class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 226–227.
  19. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11976–11986.
  20. Decoupled weight decay regularization. ArXiv Preprint arXiv:1711.05101.
  21. Practical recommendations for replay-based continual learning methods. In International Conference on Image Analysis and Processing, 548–559. Springer.
  22. On the stability of fine-tuning bert: Misconceptions, explanations, and strong baselines. ArXiv Preprint arXiv:2006.04884.
  23. Reading digits in natural images with unsupervised feature learning.
  24. The sun attribute database: Beyond categories for deeper scene understanding. International Journal of Computer Vision, 108: 59–81.
  25. Beit v2: Masked image modeling with vector-quantized visual tokenizers. ArXiv Preprint arXiv:2208.06366.
  26. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, 8748–8763. PMLR.
  27. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2001–2010.
  28. Experience replay for continual learning. Advances in Neural Information Processing Systems, 32.
  29. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115: 211–252.
  30. Laion-5B: An open large-scale dataset for training next generation image-text models. ArXiv Preprint arXiv:2210.08402.
  31. Laion-400M: Open dataset of clip-filtered 400 million image-text pairs. ArXiv Preprint arXiv:2111.02114.
  32. Partial is better than all: revisiting fine-tuning strategy for few-shot learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 9594–9602.
  33. Visual prompt tuning for generative transfer learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19840–19851.
  34. An empirical study of example forgetting during deep neural network learning. ArXiv Preprint ArXiv:1812.05159.
  35. Using adapters to overcome catastrophic forgetting in end-to-end automatic speech recognition. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5. IEEE.
  36. Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5018–5027.
  37. The caltech-ucsd birds-200-2011 dataset.
  38. Few-Shot Learning with Visual Distribution Calibration and Cross-Modal Distribution Alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 23445–23454.
  39. Revisiting classifier: Transferring vision-language models for video recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 2847–2855.
  40. Explicit inductive bias for transfer learning with convolutional networks. In International Conference on Machine Learning, 2825–2834. PMLR.
  41. Split-brain autoencoders: Unsupervised learning by cross-channel prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1058–1067.
  42. Learning deep features for scene recognition using places database. Advances in Neural Information Processing Systems, 27.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets