Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Knowledge Accumulation in Continually Learned Representations and the Issue of Feature Forgetting (2304.00933v4)

Published 3 Apr 2023 in cs.LG and cs.CV

Abstract: Continual learning research has shown that neural networks suffer from catastrophic forgetting "at the output level", but it is debated whether this is also the case at the level of learned representations. Multiple recent studies ascribe representations a certain level of innate robustness against forgetting -- that they only forget minimally in comparison with forgetting at the output level. We revisit and expand upon the experiments that revealed this difference in forgetting and illustrate the coexistence of two phenomena that affect the quality of continually learned representations: knowledge accumulation and feature forgetting. Taking both aspects into account, we show that, even though forgetting in the representation (i.e. feature forgetting) can be small in absolute terms, when measuring relative to how much was learned during a task, forgetting in the representation tends to be just as catastrophic as forgetting at the output level. Next we show that this feature forgetting is problematic as it substantially slows down the incremental learning of good general representations (i.e. knowledge accumulation). Finally, we study how feature forgetting and knowledge accumulation are affected by different types of continual learning methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Understanding intermediate layers using linear classifier probes. International Conference on Learning Representations (ICLR) 2017 workshop, 2016.
  2. Memory aware synapses: Learning what (not) to forget. In Proceedings of the European conference on computer vision (ECCV), pp.  139–154, 2018.
  3. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013.
  4. Reducing representation drift in online continual learning. International Conference on Learning Representations (ICLR) 2022, 1(3), 2021.
  5. Online fast adaptation and knowledge accumulation (osaka): a new approach to continual learning. Advances in Neural Information Processing Systems, 33:16532–16545, 2020.
  6. Co2l: Contrastive continual learning. In Proceedings of the IEEE/CVF International conference on computer vision, pp.  9516–9525, 2021.
  7. Is continual learning truly learning representations continually? arXiv preprint arXiv:2206.08101v1, 2022.
  8. Is forgetting less a good inductive bias for forward transfer? In The Eleventh International Conference on Learning Representations, 2023.
  9. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp. 1597–1607. PMLR, 2020.
  10. Lifelong machine learning, volume 1. Springer, 2018.
  11. Visual categorization with bags of keypoints. In Workshop on statistical learning in computer vision, ECCV, volume 1, pp.  1–2. Prague, 2004.
  12. Probing representation forgetting in supervised and unsupervised continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  16712–16721, 2022.
  13. A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence, 44(7):3366–3385, 2022.
  14. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.  248–255. Ieee, 2009.
  15. Self-supervised representation learning: Introduction, advances, and challenges. IEEE Signal Processing Magazine, 39(3):42–62, 2022.
  16. Rank diminishing in deep neural networks. Advances in Neural Information Processing Systems, 35:33054–33065, 2022.
  17. Self-supervised models are continual learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  9621–9630, 2022.
  18. Deep learning. MIT press, 2016.
  19. Embracing change: Continual learning in deep neural networks. Trends in cognitive sciences, 24(12):1028–1040, 2020.
  20. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  21. How well does self-supervised pre-training perform with streaming data? In International Conference on Learning Representations, 2022.
  22. Chip Huyen. Real-time machine learning: challenges and solutions, 2022. URL https://huyenchip.com/2022/01/02/real-time-machine-learning-challenges-and-solutions.html#towards-continual-learning. [Online; accessed 22-January-2024].
  23. Meta-learning representations for continual learning. Advances in neural information processing systems, 32, 2019.
  24. Self-supervised visual feature learning with deep neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 43(11):4037–4058, 2020.
  25. Supervised contrastive learning. Advances in neural information processing systems, 33:18661–18673, 2020.
  26. On the stability-plasticity dilemma of class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  20196–20204, 2023.
  27. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6980.
  28. Do better imagenet models transfer better? In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  2661–2671, 2019.
  29. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
  30. Do pre-trained models benefit equally in continual learning? In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp.  6485–6493, 2023.
  31. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017.
  32. Gradient episodic memory for continual learning. Advances in neural information processing systems, 30, 2017.
  33. Decoupled weight decay regularization. International Conference on Learning Representations (ICLR) 2019, 2017.
  34. Supervised contrastive replay: Revisiting the nearest class mean classifier in online class-incremental continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  3589–3599, 2021.
  35. Packnet: Adding multiple tasks to a single network by iterative pruning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp.  7765–7773, 2018.
  36. Continual barlow twins: continual self-supervised learning for remote sensing semantic segmentation. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2023.
  37. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of Learning and Motivation, volume 24, pp. 109–165. Elsevier, 1989.
  38. What is happening inside a continual learning model? a representation-based evaluation of representational forgetting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp.  234–235, 2020.
  39. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc., 2019. URL http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf.
  40. Dualnet: Continual learning, fast and slow. Advances in Neural Information Processing Systems, 34:16131–16144, 2021.
  41. Anatomy of catastrophic forgetting: Hidden representations and task semantics. International Conference on Learning Representations (ICLR) 2020, 2021.
  42. Continual unsupervised representation learning. Advances in Neural Information Processing Systems, 32, 2019.
  43. Roger Ratcliff. Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychological Review, 97(2):285–308, 1990.
  44. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp.  2001–2010, 2017.
  45. A brief review of nearest neighbor algorithm for learning and classification. In 2019 International Conference on Intelligent Computing and Control Systems (ICCS), pp.  1255–1260. IEEE, 2019.
  46. A study of the generalizability of self-supervised representations. Machine Learning with Applications, 6:100124, 2021.
  47. Three types of incremental learning. Nature Machine Intelligence, 4(12):1185–1197, 2022.
  48. Matching networks for one shot learning. Advances in neural information processing systems, 29, 2016.
  49. Representation ensembling for synergistic lifelong learning with quasilinear complexity. arXiv preprint arXiv:2004.12908, 2020.
  50. A comprehensive survey of continual learning: Theory, method and application. arXiv preprint arXiv:2302.00487, 2023.
  51. Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur), 53(3):1–34, 2020.
  52. Large scale incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  374–382, 2019.
  53. Learning to remember from a multi-task teacher. arXiv preprint arXiv:1910.04650, 2019.
  54. Der: Dynamically expandable representation for class incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  3014–3023, 2021.
  55. Continual learners are incremental model generalizers. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  40129–40146. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/yoon23b.html.
  56. How transferable are features in deep neural networks? Advances in neural information processing systems, 27, 2014.
  57. Barlow twins: Self-supervised learning via redundancy reduction. In International Conference on Machine Learning, pp. 12310–12320. PMLR, 2021.
  58. Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pp. 818–833. Springer, 2014.
  59. Slca: Slow learner with classifier alignment for continual learning on a pre-trained model. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023.
  60. Feature forgetting in continual representation learning. arXiv preprint arXiv:2205.13359v1, 2022.
  61. Maintaining discrimination and fairness in class incremental learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  13208–13217, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Timm Hess (3 papers)
  2. Eli Verwimp (6 papers)
  3. Gido M. van de Ven (17 papers)
  4. Tinne Tuytelaars (150 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.