Divide and not forget: Ensemble of selectively trained experts in Continual Learning (2401.10191v3)

Published 18 Jan 2024 in cs.LG and cs.CV

Abstract: Class-incremental learning is becoming more popular as it helps models widen their applicability while not forgetting what they already know. A trend in this area is to use a mixture-of-expert technique, where different models work together to solve the task. However, the experts are usually trained all at once using whole task data, which makes them all prone to forgetting and increasing computational burden. To address this limitation, we introduce a novel approach named SEED. SEED selects only one, the most optimal expert for a considered task, and uses data from this task to fine-tune only this expert. For this purpose, each expert represents each class with a Gaussian distribution, and the optimal expert is selected based on the similarity of those distributions. Consequently, SEED increases diversity and heterogeneity within the experts while maintaining the high stability of this ensemble method. The extensive experiments demonstrate that SEED achieves state-of-the-art performance in exemplar-free settings across various scenarios, showing the potential of expert diversification through data in continual learning.

References (45)

Citations (19)

View on Semantic Scholar

Summary

The paper introduces SEED, an ensemble approach that employs selective training of experts to mitigate catastrophic forgetting in CIL.
It leverages Gaussian distributions and symmetrized Kullback–Leibler divergence to minimize overlap in latent representations.
Extensive experiments demonstrate SEED's superior performance in both task-aware and task-agnostic settings, highlighting its adaptability.

Introduction

Continual Learning (CL) is characterized by a model's ability to learn from a stream of data where tasks are presented sequentially. In Class Incremental Learning (CIL), a specific CL scenario, models are required to incrementally adapt to new classes without forgetting the previously learned ones, presenting the challenges of catastrophic forgetting and limited task data. While many approaches have been proposed to address these issues, the SEED (Selection of Experts for Ensemble Diversification) method introduced in the discussed work offers a novel perspective on exemplar-free CIL. Unlike traditional methods that rely on strong feature extractors from the outset, SEED promotes expert diversification in an ensemble framework to increase model stability without significant computational overhead.

Related Work

Class-Incremental Learning has evolved with the advent of architecture-based methods, growing architectures, and ensemble techniques that dynamically adjust network parameters or utilize masking techniques to mitigate forgetting and improve plasticity. Past solutions, such as Expert Gate and CoSCL, either lead to an unsustainable increase in model parameters or require complex regularization that hindered the model's adaptability. Additionally, Gaussian Models have been employed in CL to combat the bias towards recently learned tasks; however, these techniques lacked the plasticity to efficiently learn new information.

Method

SEED introduces an ensemble of experts, each responsible for generating a unique latent representation for classes through Gaussian distributions. The crucial innovation lies in the selective training of a single expert for each new task. The selection process is based on the least overlap between class distributions in latent space, as assessed by symmetrized Kullback–Leibler divergence, ensuring reduced representational drift. During inference, Bayes classification is utilized across the ensemble for task-agnostic predictions. This design not only mitigates catastrophic forgetting but also leverages the diversity of the ensemble to maintain model plasticity against varied data redistributions.

Experiments and Discussion

Extensive experiments showcase SEED's superior performance across diverse CIL scenarios, from equal splits to domain shift cases such as DomainNet, where notable improvements in adaptability to new distributions have been evidenced. In comparison to state-of-the-art approaches, SEED consistently outperforms existing models, supporting both task-aware and task-agnostic settings.

An extensive ablation paper underscores the significance of each design choice within SEED, demonstrating how the ensemble technique, expert selection strategy, and careful balance of stability and plasticity contribute to the model's effectiveness. Furthermore, SEED has been proven to possess a robust trade-off capability through an adjustable hyperparameter, which finely tunes the model's plasticity and stability to suit the task complexity.

Conclusions

SEED emerges as a groundbreaking approach that not only addresses the classic challenges in CIL but also sets new benchmarks in model performance without relying on significant computational resources. The method's ability to preserve knowledge across a series of tasks while efficiently adapting to new ones establishes a new paradigm for researchers and practitioners in the field of Class-Incremental Learning. With potential limitations acknowledged and addressed, SEED's versatility, coupled with its state-of-the-art results, positions it as a notable advancement in continual learning research.

PDF Markdown

Tweets

https://twitter.com/tomasz_klempka/status/1748926020781044024