Papers
Topics
Authors
Recent
Search
2000 character limit reached

Revisiting Softmax Masking: Stop Gradient for Enhancing Stability in Replay-based Continual Learning

Published 26 Sep 2023 in cs.LG and cs.AI | (2309.14808v2)

Abstract: In replay-based methods for continual learning, replaying input samples in episodic memory has shown its effectiveness in alleviating catastrophic forgetting. However, the potential key factor of cross-entropy loss with softmax in causing catastrophic forgetting has been underexplored. In this paper, we analyze the effect of softmax and revisit softmax masking with negative infinity to shed light on its ability to mitigate catastrophic forgetting. Based on the analyses, it is found that negative infinity masked softmax is not always compatible with dark knowledge. To improve the compatibility, we propose a general masked softmax that controls the stability by adjusting the gradient scale to old and new classes. We demonstrate that utilizing our method on other replay-based methods results in better performance, primarily by enhancing model stability in continual learning benchmarks, even when the buffer size is set to an extremely small value.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Robert M French. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3(4):128–135, 1999.
  2. Efficient lifelong learning with a-gem. arXiv preprint arXiv:1812.00420, 2018.
  3. Large scale incremental learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 374–382, 2019.
  4. Dark experience for general continual learning: a strong, simple baseline. Advances in neural information processing systems, 33:15920–15930, 2020.
  5. Il2m: Class incremental learning with dual memory. In Proceedings of the IEEE/CVF international conference on computer vision, pages 583–592, 2019.
  6. Learning to learn without forgetting by maximizing transfer and minimizing interference. arXiv preprint arXiv:1810.11910, 2018.
  7. New insights on reducing abrupt representation change in online continual learning. arXiv preprint arXiv:2104.05025, 2021.
  8. Gdumb: A simple approach that questions our progress in continual learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 524–540. Springer, 2020.
  9. Gradient episodic memory for continual learning. Advances in neural information processing systems, 30, 2017.
  10. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2001–2010, 2017.
  11. Learning a unified classifier incrementally via rebalancing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 831–839, 2019.
  12. Class-incremental learning with pre-allocated fixed classifiers. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 6259–6266. IEEE, 2021.
  13. Selective experience replay for lifelong learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
  14. Memory efficient experience replay for streaming learning. In 2019 International Conference on Robotics and Automation (ICRA), pages 9769–9776. IEEE, 2019.
  15. Class-incremental continual learning into the extended der-verse. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):5497–5512, 2022.
  16. A. Robins. Catastrophic forgetting in neural networks: the role of rehearsal mechanisms. In Proceedings 1993 The First New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems, pages 65–68, 1993.
  17. Mitigating neural network overconfidence with logit normalization. In International Conference on Machine Learning, pages 23631–23644. PMLR, 2022.
  18. Revisiting softmax for uncertainty approximation in text classification. Information, 14(7):420, 2023.
  19. A deep learning method for breast cancer classification in the pathology images. IEEE Journal of Biomedical and Health Informatics, 26(10):5025–5032, 2022.
  20. gsasrec: Reducing overconfidence in sequential recommendation trained with negative sampling. In Proceedings of the 17th ACM Conference on Recommender Systems, pages 116–128, 2023.
  21. Deep deterministic uncertainty: A simple baseline, 2022.
  22. Neural collapse inspired feature-classifier alignment for few-shot class-incremental learning. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023.
  23. Three types of incremental learning. Nature Machine Intelligence, 4(12):1185–1197, 2022.
  24. Co2l: Contrastive continual learning. In Proceedings of the IEEE/CVF International conference on computer vision, pages 9516–9525, 2021.
  25. New insights on reducing abrupt representation change in online continual learning, 2022.
  26. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 139–149, 2022.
  27. Coda-prompt: Continual decomposed attention-based prompting for rehearsal-free continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11909–11919, June 2023.
  28. Dualprompt: Complementary prompting for rehearsal-free continual learning. In European Conference on Computer Vision, pages 631–648. Springer, 2022.
  29. Retrospective adversarial replay for continual learning. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 28530–28544. Curran Associates, Inc., 2022.
  30. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015.
  31. Progress & compress: A scalable framework for continual learning. In International conference on machine learning, pages 4528–4537. PMLR, 2018.
  32. Continual learning through synaptic intelligence. In International conference on machine learning, pages 3987–3995. PMLR, 2017.
  33. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017.
  34. Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016.
  35. Measuring and regularizing networks in function space. arXiv preprint arXiv:1805.08289, 2018.
  36. Gradient based sample selection for online continual learning. Advances in neural information processing systems, 32, 2019.
  37. Using hindsight to anchor past knowledge in continual learning. In Proceedings of the AAAI conference on artificial intelligence, pages 6993–7001, 2021.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.