Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Complementary Learning Subnetworks for Parameter-Efficient Class-Incremental Learning (2306.11967v1)

Published 21 Jun 2023 in cs.LG and cs.CV

Abstract: In the scenario of class-incremental learning (CIL), deep neural networks have to adapt their model parameters to non-stationary data distributions, e.g., the emergence of new classes over time. However, CIL models are challenged by the well-known catastrophic forgetting phenomenon. Typical methods such as rehearsal-based ones rely on storing exemplars of old classes to mitigate catastrophic forgetting, which limits real-world applications considering memory resources and privacy issues. In this paper, we propose a novel rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks. Our approach involves jointly optimizing a plastic CNN feature extractor and an analytical feed-forward classifier. The inaccessibility of historical data is tackled by holistically controlling the parameters of a well-trained model, ensuring that the decision boundary learned fits new classes while retaining recognition of previously learned classes. Specifically, the trainable CNN feature extractor provides task-dependent knowledge separately without interference; and the final classifier integrates task-specific knowledge incrementally for decision-making without forgetting. In each CIL session, it accommodates new tasks by attaching a tiny set of declarative parameters to its backbone, in which only one matrix per task or one vector per class is kept for knowledge retention. Extensive experiments on a variety of task sequences show that our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order robustness. Furthermore, to make the non-growing backbone (i.e., a model with limited network capacity) suffice to train on more incoming tasks, a graceful forgetting implementation on previously learned trivial tasks is empirically investigated.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. R. Minhas, A. A. Mohammed, and Q. J. Wu, “Incremental learning in human action recognition based on snippets,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 11, pp. 1529–1541, 2012.
  2. X. Mu, K. M. Ting, and Z.-H. Zhou, “Classification under streaming emerging new classes: A solution using completely-random trees,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 8, pp. 1605–1618, 2017.
  3. Q. Wang, G. Sun, J. Dong, Q. Wang, and Z. Ding, “Continuous multi-view human action recognition,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 6, pp. 3603–3614, 2022.
  4. M. De Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars, “A continual learning survey: Defying forgetting in classification tasks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 7, pp. 3366–3385, 2022.
  5. D. S. Tan, Y.-X. Lin, and K.-L. Hua, “Incremental learning of multi-domain image-to-image translations,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 4, pp. 1526–1539, 2021.
  6. M. McCloskey and N. J. Cohen, “Catastrophic interference in connectionist networks: The sequential learning problem,” in Psychology of Learning and Motivation.   Elsevier, 1989, vol. 24, pp. 109–165.
  7. S. Thrun and T. M. Mitchell, “Lifelong robot learning,” Robotics and Autonomous Systems, vol. 15, no. 1-2, pp. 25–46, 1995.
  8. Z. Li and D. Hoiem, “Learning without forgetting,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 12, pp. 2935–2947, 2017.
  9. H. Liu, X. Zhu, Z. Lei, D. Cao, and S. Z. Li, “Fast adapting without forgetting for face recognition,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 8, pp. 3093–3104, 2021.
  10. R. M. French, “Catastrophic forgetting in connectionist networks,” Trends in Cognitive Sciences, vol. 3, no. 4, pp. 128–135, 1999.
  11. S. Wang, W. Shi, S. Dong, X. Gao, X. Song, and Y. Gong, “Semantic knowledge guided class-incremental learning,” IEEE Transactions on Circuits and Systems for Video Technology, 2023, doi: 10.1109/TCSVT.2023.3262739.
  12. G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, “Continual lifelong learning with neural networks: A review,” Neural Networks, vol. 113, pp. 54–71, 2019.
  13. M. Masana, X. Liu, B. Twardowski, M. Menta, A. D. Bagdanov, and J. van de Weijer, “Class-incremental learning: survey and performance evaluation on image classification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 5, pp. 5513–5533, 2023.
  14. Q. Hu, Y. Gao, and B. Cao, “Curiosity-driven class-incremental learning via adaptive sample selection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 12, pp. 8660–8673, 2022.
  15. H. Lin, S. Feng, X. Li, W. Li, and Y. Ye, “Anchor assisted experience replay for online class-incremental learning,” IEEE Transactions on Circuits and Systems for Video Technology, 2022, doi: 10.1109/TCSVT.2022.3219605.
  16. S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “iCaRL: Incremental classifier and representation learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2001–2010.
  17. D. Lopez-Paz and M. Ranzato, “Gradient episodic memory for continual learning,” in Advances in Neural Information Processing Systems, vol. 30, 2017, pp. 6470–6479.
  18. R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars, “Memory aware synapses: Learning what (not) to forget,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 139–154.
  19. J. Bang, H. Kim, Y. Yoo, J.-W. Ha, and J. Choi, “Rainbow memory: Continual learning with a memory of diverse samples,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8218–8227.
  20. X. Liu, C. Wu, M. Menta, L. Herranz, B. Raducanu, A. D. Bagdanov, S. Jui, and J. v. de Weijer, “Generative feature replay for class-incremental learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 226–227.
  21. G. M. van de Ven, H. T. Siegelmann, and A. S. Tolias, “Brain-inspired replay for continual learning with artificial neural networks,” Nature Communications, vol. 11, no. 1, pp. 1–14, 2020.
  22. R. Aljundi, E. Belilovsky, T. Tuytelaars, L. Charlin, M. Caccia, M. Lin, and L. Page-Caccia, “Online continual learning with maximal interfered retrieval,” in Advances in Neural Information Processing Systems, 2019, pp. 11 849–11 860.
  23. L. Wang, B. Lei, Q. Li, H. Su, J. Zhu, and Y. Zhong, “Triple-memory networks: A brain-inspired method for continual learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 5, pp. 1925–1934, 2022.
  24. J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska et al., “Overcoming catastrophic forgetting in neural networks,” Proceedings of the National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017.
  25. F. Zenke, B. Poole, and S. Ganguli, “Continual learning through synaptic intelligence,” in International Conference on Machine Learning, 2017, pp. 3987–3995.
  26. J. Zhang, J. Zhang, S. Ghosh, D. Li, S. Tasci, L. Heck, H. Zhang, and C.-C. J. Kuo, “Class-incremental learning via deep model consolidation,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020, pp. 1131–1140.
  27. A. Rosenfeld and J. K. Tsotsos, “Incremental learning through deep adaptation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 3, pp. 651–663, 2018.
  28. J. Serrà, D. Suris, M. Miron, and A. Karatzoglou, “Overcoming catastrophic forgetting with hard attention to the task,” in International Conference on Machine Learning.   PMLR, 2018, pp. 4548–4557.
  29. Z. Ke, B. Liu, N. Ma, H. Xu, and L. Shu, “Achieving forgetting prevention and knowledge transfer in continual learning,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 22 443–22 456.
  30. A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell, “Progressive neural networks,” arXiv preprint arXiv:1606.04671, 2016.
  31. V. K. Verma, K. J. Liang, N. Mehta, P. Rai, and L. Carin, “Efficient feature transformations for discriminative and generative continual learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 13 865–13 875.
  32. W. Hu, Q. Qin, M. Wang, J. Ma, and B. Liu, “Continual learning by using information of each class holistically,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2021, pp. 7797–7805.
  33. Z. Hu, Y. Li, J. Lyu, D. Gao, and N. Vasconcelos, “Dense network expansion for class incremental learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 11 858–11 867.
  34. G. M. Van de Ven and A. S. Tolias, “Three scenarios for continual learning,” arXiv preprint arXiv:1904.07734, 2019.
  35. Y.-C. Hsu, Y.-C. Liu, A. Ramasamy, and Z. Kira, “Re-evaluating continual learning scenarios: A categorization and case for strong baselines,” arXiv preprint arXiv:1810.12488, 2018.
  36. S. Tang, D. Chen, J. Zhu, S. Yu, and W. Ouyang, “Layerwise optimization by gradient decomposition for continual learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 9634–9643.
  37. E. Belouadah and A. Popescu, “IL2M: Class incremental learning with dual memory,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 583–592.
  38. S. Tong, X. Dai, Z. Wu, M. Li, B. Yi, and Y. Ma, “Incremental learning of structured memory via closed-loop transcription,” arXiv preprint arXiv:2202.05411, 2022.
  39. G. Zeng, Y. Chen, B. Cui, and S. Yu, “Continual learning of context-dependent processing in neural networks,” Nature Machine Intelligence, vol. 1, no. 8, pp. 364–372, 2019.
  40. Y.-M. Tang, Y.-X. Peng, and W.-S. Zheng, “Learning to imagine: Diversify memory for incremental learning using unlabeled data,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 9549–9558.
  41. S. Yan, J. Xie, and X. He, “DER: Dynamically expandable representation for class incremental learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3014–3023.
  42. F.-Y. Wang, D.-W. Zhou, H.-J. Ye, and D.-C. Zhan, “FOSTER: Feature boosting and compression for class-incremental learning,” in Proceedings of the European Conference on Computer Vision.   Springer, 2022, pp. 398–414.
  43. A. Douillard, A. Ramé, G. Couairon, and M. Cord, “DyTox: Transformers for continual learning with dynamic token expansion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 9285–9295.
  44. D.-W. Zhou, Q.-W. Wang, H.-J. Ye, and D.-C. Zhan, “A model or 603 exemplars: Towards memory-efficient class-incremental learning,” arXiv preprint arXiv:2205.13218, 2022.
  45. A. Rios, N. Ahuja, I. Ndiour, U. Genc, L. Itti, and O. Tickoo, “incDFM: Incremental deep feature modeling for continual novelty detection,” in Proceedings of the European Conference on Computer Vision.   Springer, 2022, pp. 588–604.
  46. T.-Y. Wu, G. Swaminathan, Z. Li, A. Ravichandran, N. Vasconcelos, R. Bhotika, and S. Soatto, “Class-incremental learning with strong pre-trained models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 9601–9610.
  47. T. L. Hayes, K. Kafle, R. Shrestha, M. Acharya, and C. Kanan, “Remind your neural network to prevent catastrophic forgetting,” in Proceedings of the European Conference on Computer Vision, 2020, pp. 466–483.
  48. Z. Wang, Z. Zhang, C.-Y. Lee, H. Zhang, R. Sun, X. Ren, G. Su, V. Perot, J. Dy, and T. Pfister, “Learning to prompt for continual learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 139–149.
  49. Z. Wang, Z. Zhang, S. Ebrahimi, R. Sun, H. Zhang, C.-Y. Lee, X. Ren, G. Su, V. Perot, J. Dy et al., “Dualprompt: Complementary prompting for rehearsal-free continual learning,” in Proceedings of the European Conference on Computer Vision.   Springer, 2022, pp. 631–648.
  50. D.-W. Zhou, Q.-W. Wang, Z.-H. Qi, H.-J. Ye, D.-C. Zhan, and Z. Liu, “Deep class-incremental learning: A survey,” arXiv preprint arXiv:2302.03648, 2023.
  51. S. I. Mirzadeh, A. Chaudhry, D. Yin, T. Nguyen, R. Pascanu, D. Gorur, and M. Farajtabar, “Architecture matters in continual learning,” arXiv preprint arXiv:2202.00275, 2022.
  52. C. L. P. Chen and Z. Liu, “Broad learning system: An effective and efficient incremental learning system without the need for deep architecture,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 1, pp. 10–24, 2017.
  53. D. Wang and M. Li, “Stochastic configuration networks: Fundamentals and algorithms,” IEEE Transactions on Cybernetics, vol. 47, no. 10, pp. 3466–3479, 2017.
  54. R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 58, no. 1, pp. 267–288, 1996.
  55. S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein et al., “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends® in Machine learning, vol. 3, no. 1, pp. 1–122, 2011.
  56. A. J. Smola, S. Vishwanathan, and E. Eskin, “Laplace propagation,” in Advances in Neural Information Processing Systems, 2004, pp. 441–448.
  57. T. P. Minka, “A family of algorithms for approximate bayesian inference,” Ph.D. dissertation, Massachusetts Institute of Technology, 2001.
  58. D. J. MacKay, “A practical bayesian framework for backpropagation networks,” Neural Computation, vol. 4, no. 3, pp. 448–472, 1992.
  59. J. Martens, “New insights and perspectives on the natural gradient method,” arXiv preprint arXiv:1412.1193, 2014.
  60. R. Pascanu and Y. Bengio, “Revisiting natural gradient for deep networks,” arXiv preprint arXiv:1301.3584, 2013.
  61. F. Huszár, “On quadratic penalties in elastic weight consolidation,” arXiv preprint arXiv:1712.03847, 2017.
  62. P. Pan, S. Swaroop, A. Immer, R. Eschenhagen, R. Turner, and M. E. E. Khan, “Continual deep learning by functional regularisation of memorable past,” in Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 4453–4464.
  63. D. Deng, G. Chen, J. Hao, Q. Wang, and P.-A. Heng, “Flattening sharpness for dynamic gradient projection memory benefits continual learning,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 18 710–18 721.
  64. R. Wang, Y. Bao, B. Zhang, J. Liu, W. Zhu, and G. Guo, “Anti-retroactive interference for lifelong learning,” in Proceedings of the European Conference on Computer Vision, 2022, pp. 163–178.
  65. G. M. Van de Ven and A. S. Tolias, “Generative replay with feedback connections as a general strategy for continual learning,” arXiv preprint arXiv:1809.10635, 2018.
  66. M. Wołczyk, K. Piczak, B. Wójcik, L. Pustelnik, P. Morawiecki, J. Tabor, T. Trzcinski, and P. Spurek, “Continual learning with guarantees via weight interval constraints,” in International Conference on Machine Learning.   PMLR, 2022, pp. 23 897–23 911.
  67. J. Rajasegaran, M. Hayat, S. Khan, F. S. Khan, and L. Shao, “Random path selection for incremental learning,” in Advances in Neural Information Processing Systems, 2019, pp. 12 669–12 679.
  68. G. Sokar, D. C. Mocanu, and M. Pechenizkiy, “Spacenet: Make free space for continual learning,” Neurocomputing, vol. 439, pp. 1–11, 2021.
  69. X. Chen and K. He, “Exploring simple siamese representation learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15 750–15 758.
  70. L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE.” Journal of Machine Learning Research, vol. 9, no. 11, 2008.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Depeng Li (8 papers)
  2. Zhigang Zeng (28 papers)
Citations (1)