Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Divide and not forget: Ensemble of selectively trained experts in Continual Learning (2401.10191v3)

Published 18 Jan 2024 in cs.LG and cs.CV

Abstract: Class-incremental learning is becoming more popular as it helps models widen their applicability while not forgetting what they already know. A trend in this area is to use a mixture-of-expert technique, where different models work together to solve the task. However, the experts are usually trained all at once using whole task data, which makes them all prone to forgetting and increasing computational burden. To address this limitation, we introduce a novel approach named SEED. SEED selects only one, the most optimal expert for a considered task, and uses data from this task to fine-tune only this expert. For this purpose, each expert represents each class with a Gaussian distribution, and the optimal expert is selected based on the similarity of those distributions. Consequently, SEED increases diversity and heterogeneity within the experts while maintaining the high stability of this ensemble method. The extensive experiments demonstrate that SEED achieves state-of-the-art performance in exemplar-free settings across various scenarios, showing the potential of expert diversification through data in continual learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Expert gate: Lifelong learning with a network of experts. In Conference on Computer Vision and Pattern Recognition, CVPR, 2017.
  2. Deesil: Deep-shallow incremental learning. TaskCV Workshop @ ECCV 2018., 2018.
  3. Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XI, Lecture Notes in Computer Science, pp.  556–572, 2018.
  4. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA, pp.  248–255, 2009.
  5. Continual learning beyond a single model. arXiv preprint arXiv:2202.09826, 2022.
  6. Robert M French. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3(4):128–135, 1999.
  7. Lifelong machine learning with deep streaming linear discriminant analysis. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2020a.
  8. Lifelong machine learning with deep streaming linear discriminant analysis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp.  220–221, 2020b.
  9. Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition, CVPR, 2016.
  10. Augmix: A simple data processing method to improve robustness and uncertainty. In International Conference on Learning Representations, 2019.
  11. Significance of softmax-based features in comparison to distance metric learning-based features. IEEE transactions on pattern analysis and machine intelligence, 42(5):1279–1285, 2019.
  12. Learning a unified classifier incrementally via rebalancing. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 831–839, 2019.
  13. Optimizing reusable knowledge for continual learning via metalearning. Advances in Neural Information Processing Systems, 34:14150–14162, 2021.
  14. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
  15. Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
  16. Learning without forgetting. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part IV, volume 9908 of Lecture Notes in Computer Science, pp.  614–629, 2016.
  17. More classifiers, less forgetting: A generic multi-classifier paradigm for incremental learning. In European Conference on Computer Vision, pp.  699–716. Springer, 2020.
  18. Class-incremental learning: Survey and performance evaluation on image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp.  1–20, 2022.
  19. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pp. 109–165. Elsevier, 1989.
  20. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  21. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE International Conference on Computer Vision, pp.  1406–1415, 2019.
  22. Fetril: Feature translation for exemplar-free class-incremental learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp.  3911–3920, 2023.
  23. Bns: Building network structures dynamically for continual learning. Advances in Neural Information Processing Systems, 34:20608–20620, 2021.
  24. Continual unsupervised representation learning. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp.  7645–7655, 2019.
  25. A tinyml platform for on-device continual learning with quantized latent replays. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 11(4):789–802, 2021.
  26. icarl: Incremental classifier and representation learning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 5533–5542, 2017.
  27. Weighted ensemble self-supervised learning. In 11th International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, Conference Track Proceedings, 2023.
  28. Progressive neural networks. CoRR, abs/1606.04671, 2016.
  29. Overcoming catastrophic forgetting with hard attention to the task. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholm, Sweden, July 10-15, 2018, pp. 4555–4564, 2018.
  30. Always be dreaming: A new approach for data-free class-incremental learning. In IEEE/CVF International Conference on Computer Vision, ICCV 2021, pp.  9354–9364. IEEE, 2021.
  31. Gido M Van de Ven and Andreas S Tolias. Three scenarios for continual learning. arXiv preprint arXiv:1904.07734, 2019.
  32. Class-incremental learning with generative classifiers. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2021, virtual, June 19-25, 2021, pp. 3611–3620, 2021.
  33. Efficient continual learning with modular networks and task-driven priors. arXiv preprint arXiv:2012.12631, 2020.
  34. FOSTER: feature boosting and compression for class-incremental learning. In Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXV, pp. 398–414, 2022a.
  35. Coscl: Cooperation of small continual learners is stronger than a big one. In Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXVI, pp. 254–271, 2022b.
  36. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  139–149, 2022c.
  37. Large scale incremental learning. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 374–382, 2019.
  38. DER: dynamically expandable representation for class incremental learning. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pp.  3014–3023, 2021.
  39. Continual learning with bayesian model based on a fixed pre-trained feature extractor. In Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 Strasbourg, France, September, 27 - October 1, 2021, Proceedings, Part V, pp.  397–406, 2021.
  40. Semantic drift compensation for class-incremental learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 6980–6989. IEEE, 2020.
  41. Maintaining discrimination and fairness in class incremental learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 13205–13214. IEEE, 2020.
  42. Pycil: A python toolbox for class-incremental learning, 2021.
  43. Class-incremental learning via dual augmentation. Advances in Neural Information Processing Systems, 34, 2021a.
  44. Prototype augmentation and self-supervision for incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  5871–5880, 2021b.
  45. Self-sustaining representation expansion for non-exemplar class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  9296–9305, 2022.
Citations (19)

Summary

  • The paper introduces SEED, an ensemble approach that employs selective training of experts to mitigate catastrophic forgetting in CIL.
  • It leverages Gaussian distributions and symmetrized Kullback–Leibler divergence to minimize overlap in latent representations.
  • Extensive experiments demonstrate SEED's superior performance in both task-aware and task-agnostic settings, highlighting its adaptability.

Introduction

Continual Learning (CL) is characterized by a model's ability to learn from a stream of data where tasks are presented sequentially. In Class Incremental Learning (CIL), a specific CL scenario, models are required to incrementally adapt to new classes without forgetting the previously learned ones, presenting the challenges of catastrophic forgetting and limited task data. While many approaches have been proposed to address these issues, the SEED (Selection of Experts for Ensemble Diversification) method introduced in the discussed work offers a novel perspective on exemplar-free CIL. Unlike traditional methods that rely on strong feature extractors from the outset, SEED promotes expert diversification in an ensemble framework to increase model stability without significant computational overhead.

Related Work

Class-Incremental Learning has evolved with the advent of architecture-based methods, growing architectures, and ensemble techniques that dynamically adjust network parameters or utilize masking techniques to mitigate forgetting and improve plasticity. Past solutions, such as Expert Gate and CoSCL, either lead to an unsustainable increase in model parameters or require complex regularization that hindered the model's adaptability. Additionally, Gaussian Models have been employed in CL to combat the bias towards recently learned tasks; however, these techniques lacked the plasticity to efficiently learn new information.

Method

SEED introduces an ensemble of experts, each responsible for generating a unique latent representation for classes through Gaussian distributions. The crucial innovation lies in the selective training of a single expert for each new task. The selection process is based on the least overlap between class distributions in latent space, as assessed by symmetrized Kullback–Leibler divergence, ensuring reduced representational drift. During inference, Bayes classification is utilized across the ensemble for task-agnostic predictions. This design not only mitigates catastrophic forgetting but also leverages the diversity of the ensemble to maintain model plasticity against varied data redistributions.

Experiments and Discussion

Extensive experiments showcase SEED's superior performance across diverse CIL scenarios, from equal splits to domain shift cases such as DomainNet, where notable improvements in adaptability to new distributions have been evidenced. In comparison to state-of-the-art approaches, SEED consistently outperforms existing models, supporting both task-aware and task-agnostic settings.

An extensive ablation paper underscores the significance of each design choice within SEED, demonstrating how the ensemble technique, expert selection strategy, and careful balance of stability and plasticity contribute to the model's effectiveness. Furthermore, SEED has been proven to possess a robust trade-off capability through an adjustable hyperparameter, which finely tunes the model's plasticity and stability to suit the task complexity.

Conclusions

SEED emerges as a groundbreaking approach that not only addresses the classic challenges in CIL but also sets new benchmarks in model performance without relying on significant computational resources. The method's ability to preserve knowledge across a series of tasks while efficiently adapting to new ones establishes a new paradigm for researchers and practitioners in the field of Class-Incremental Learning. With potential limitations acknowledged and addressed, SEED's versatility, coupled with its state-of-the-art results, positions it as a notable advancement in continual learning research.

X Twitter Logo Streamline Icon: https://streamlinehq.com