Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental Learning (2205.13218v2)

Published 26 May 2022 in cs.LG and cs.CV

Abstract: Real-world applications require the classification model to adapt to new classes without forgetting old ones. Correspondingly, Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement. Typical CIL methods tend to save representative exemplars from former classes to resist forgetting, while recent works find that storing models from history can substantially boost the performance. However, the stored models are not counted into the memory budget, which implicitly results in unfair comparisons. We find that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work, especially for the case with limited memory budgets. As a result, we need to holistically evaluate different CIL methods at different memory scales and simultaneously consider accuracy and memory size for measurement. On the other hand, we dive deeply into the construction of the memory buffer for memory efficiency. By analyzing the effect of different layers in the network, we find that shallow and deep layers have different characteristics in CIL. Motivated by this, we propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel. MEMO extends specialized layers based on the shared generalized representations, efficiently extracting diverse representations with modest cost and maintaining representative exemplars. Extensive experiments on benchmark datasets validate MEMO's competitive performance. Code is available at: https://github.com/wangkiw/ICLR23-MEMO

Citations (91)

Summary

  • The paper introduces MEMO, a memory-efficient method that selectively expands only deep layers to combat catastrophic forgetting in class-incremental learning.
  • It demonstrates significant performance gains on CIFAR100 and ImageNet with reduced memory use compared to exemplar- and model-based approaches.
  • The study highlights that freezing shallow layers while updating deep layers offers a scalable solution for managing limited memory resources in continual learning.

Memory-Efficient Class-Incremental Learning: An Examination of the MEMO Approach

The challenge of class-incremental learning (CIL) involves developing machine learning models that can accommodate new classes in a continual learning framework without forgetting previously learned ones. The research paper "A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental Learning" by Zhou et al. addresses this challenge by emphasizing the importance of memory-efficient models. It proposes a methodological shift from traditional models to a novel approach called the Memory-efficient Expandable Model (MEMO), which strives to balance performance and memory allocation.

Overview of Methodological Insights

In traditional CIL, two primary strategies are employed: exemplar-based methods and model-based approaches. Exemplar-based methods, such as iCaRL and WA, typically utilize a fixed number of representative instances from previous classes to combat catastrophic forgetting. This involves strategies like knowledge distillation and weight alignment to mitigate bias against previous classes. Model-based methods, such as DER, propose creating new model components for each learning task, which can enhance representational diversity but significantly expands memory requirements.

The novel MEMO approach, proposed in the paper, builds on these ideas by critically analyzing the roles of different layers within neural networks. The analysis indicates that shallow layers of networks capture broader, generalized features that remain relatively consistent across different learning tasks. In contrast, deeper layers tend to learn task-specific features, necessitating periodic updates to accommodate new information without forgetting the old.

Consequently, MEMO introduces a structure that selectively extends only the deeper layers when new tasks are introduced, while reusing the shallow ones across all tasks. This selective extension reduces memory overhead by not repeatedly storing generalized features that remain useful across tasks. Additionally, the saved memory can be reallocated to store additional exemplars, further enhancing model performance without increasing the overall memory budget.

Experimental Evaluation and Numerical Results

Comprehensive experiments are conducted on benchmark datasets such as CIFAR100 and ImageNet to validate the effectiveness of MEMO. The results demonstrate that MEMO achieves competitive or superior accuracy compared to both exemplar-based and model-based methods under various memory constraints. MEMO maintains a balance between memory usage and performance, showing significant promise particularly when the memory budget is limited. Notable numerical results from the experiments include significant accuracy improvements over other methods (e.g., MEMO achieved a last accuracy of 73.16% on CIFAR100, Base0 Inc5 configuration).

Theoretical and Practical Implications

The implications of MEMO are significant as they underscore the notion that not all layers in deep learning models warrant equal treatment in CIL. By systematically analyzing gradient norms and layer shifts through Centered Kernel Alignment (CKA) measurements, the paper rationalizes that only specialized layers require expansion, while generalized layers can be shared across tasks. This insight can potentially recalibrate existing practices in both CIL and broader applications of deep learning, encouraging researchers to consider differentiated layer treatment to improve memory efficiency and model performance.

Future Directions in AI

This work opens several avenues for future research. Firstly, exploring automatic systems or learning policies that can dynamically decide which layers to freeze or train could further enhance the adaptability and efficiency of CIL systems. Additionally, applying similar strategies to other neural network architectures, such as transformers in NLP tasks, could yield further improvements in managing limited computational resources. Lastly, integrating advances in memory compression techniques, such as quantization and distillation, with MEMO could present even more robust solutions to memory-related constraints across a variety of applications.

In conclusion, MEMO offers a promising direction for achieving memory-efficient class-incremental learning, marrying model performance and resource allocation effectively. It provokes wider discussions on the future of adaptable and scalable AI systems under real-world constraints, challenging traditional paradigms and encouraging innovation.