Rethinking Class-incremental Learning in the Era of Large Pre-trained Models via Test-Time Adaptation (2310.11482v2)
Abstract: Class-incremental learning (CIL) is a challenging task that involves sequentially learning to categorize classes from new tasks without forgetting previously learned information. The advent of large pre-trained models (PTMs) has fast-tracked the progress in CIL due to the highly transferable PTM representations, where tuning a small set of parameters leads to state-of-the-art performance when compared with the traditional CIL methods that are trained from scratch. However, repeated fine-tuning on each task destroys the rich representations of the PTMs and further leads to forgetting previous tasks. To strike a balance between the stability and plasticity of PTMs for CIL, we propose a novel perspective of eliminating training on every new task and instead train PTM only on the first task, and then refine its representation at inference time using test-time adaptation (TTA). Concretely, we propose Test-Time Adaptation for Class-Incremental Learning (TTACIL) that first fine-tunes PTMs using Adapters on the first task, then adjusts Layer Norm parameters of the PTM on each test instance for learning task-specific features, and finally resets them back to the adapted model to preserve stability. As a consequence, our TTACIL does not undergo any forgetting, while benefiting each task with the rich PTM features. Additionally, by design, our TTACIL is robust to common data corruptions. Our method outperforms several state-of-the-art CIL methods when evaluated on multiple CIL benchmarks under both clean and corrupted data. Code is available at: https://github.com/IemProg/TTACIL.
- Feta: Towards specializing foundational models for expert task applications. Adv. Neural Inform. Process. Syst., 2022.
- Memory aware synapses: Learning what (not) to forget. In Eur. Conf. Comput. Vis., 2018.
- Layer normalization, 2016.
- Mt3: Meta test-time training for self-supervised test-time adaption. In ICAIC. PMLR, 2022.
- A comprehensive study of class incremental learning algorithms for visual tasks. Neural Networks, 135, 2021.
- Dark experience for general continual learning: a strong, simple baseline. Adv. Neural Inform. Process. Syst., 33, 2020.
- Emerging properties in self-supervised vision transformers. In Int. Conf. Comput. Vis., 2021.
- End-to-end incremental learning. In Eur. Conf. Comput. Vis., 2018.
- Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Eur. Conf. Comput. Vis., 2018.
- Contrastive test-time adaptation. In IEEE Conf. Comput. Vis. Pattern Recog., 2022a.
- Adaptformer: Adapting vision transformers for scalable visual recognition. In Adv. Neural Inform. Process. Syst., 2022b.
- When vision transformers outperform resnets without pre-training or strong data augmentations. In Int. Conf. Learn. Represent., 2022c.
- Imagenet: A large-scale hierarchical image database. In IEEE Conf. Comput. Vis. Pattern Recog., 2009.
- Learning without memorizing. In IEEE Conf. Comput. Vis. Pattern Recog., 2019.
- An image is worth 16x16 words: Transformers for image recognition at scale. In Int. Conf. Learn. Represent., 2021.
- Podnet: Pooled outputs distillation for small-tasks incremental learning. In Eur. Conf. Comput. Vis., 2020.
- Adversarial continual learning. In Eur. Conf. Comput. Vis., 2020.
- François Fleuret et al. Test time adaptation through perturbation robustness. In Adv. Neural Inform. Process. Syst. Worksh., 2021.
- Robert M French. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3(4), 1999.
- A survey on ensemble learning for data stream classification. CSUR, 50(2), 2017.
- Semi-supervised learning by entropy minimization. Adv. Neural Inform. Process. Syst., 17, 2004.
- Deep residual learning for image recognition. In IEEE Conf. Comput. Vis. Pattern Recog., 2015.
- Masked autoencoders are scalable vision learners. In IEEE Conf. Comput. Vis. Pattern Recog., 2022.
- Benchmarking neural network robustness to common corruptions and perturbations. Int. Conf. Learn. Represent., 2019.
- The many faces of robustness: A critical analysis of out-of-distribution generalization. In Int. Conf. Comput. Vis., 2021a.
- Natural adversarial examples. In IEEE Conf. Comput. Vis. Pattern Recog., 2021b.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- Learning a unified classifier incrementally via rebalancing. In IEEE Conf. Comput. Vis. Pattern Recog., 2019.
- Parameter-efficient transfer learning for nlp. In Int. Conf. Mach. Lear., 2019.
- A simple baseline that questions the use of pretrained-models in continual learning. In CVPR Workshop on Distribution Shifts, 2022.
- Visual prompt tuning. In Eur. Conf. Comput. Vis., 2022.
- Less-forgetting learning in deep neural networks. arXiv preprint arXiv:1607.00122, 2016.
- Segment anything. Int. Conf. Comput. Vis., 2023.
- Overcoming catastrophic forgetting in neural networks. PNAS, 114(13), 2017.
- Learning multiple layers of features from tiny images, 2009.
- Fine-tuning can distort pretrained features and underperform out-of-distribution. In Int. Conf. Learn. Represent., 2022.
- Continual learning with extended kronecker-factored approximate curvature. In IEEE Conf. Comput. Vis. Pattern Recog., 2020.
- Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell., 2017.
- A comprehensive survey on test-time adaptation under distribution shifts. arXiv preprint arXiv:2303.15361, 2023.
- Rotate your networks: Better weight consolidation and less catastrophic forgetting. In ICPR, 2018.
- Mnemonics training: Multi-class incremental learning without forgetting. In IEEE Conf. Comput. Vis. Pattern Recog., 2020.
- Ttt++: When does self-supervised test-time training fail or thrive? Adv. Neural Inform. Process. Syst., 2021.
- Supervised contrastive replay: Revisiting the nearest class mean classifier in online class-incremental continual learning. In IEEE Conf. Comput. Vis. Pattern Recog., 2021.
- Packnet: Adding multiple tasks to a single network by iterative pruning. In IEEE Conf. Comput. Vis. Pattern Recog., 2018.
- Piggyback: Adapting a single network to multiple tasks by learning to mask weights. In Eur. Conf. Comput. Vis., 2018.
- Mini but mighty: Finetuning vits with mini adapters. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024.
- Class-incremental learning: survey and performance evaluation on image classification. IEEE Trans. Pattern Anal. Mach. Intell., 2022.
- Distance-based image classification: Generalizing to new classes at near-zero cost. IEEE Trans. Pattern Anal. Mach. Intell., 2013.
- The stability-plasticity dilemma: Investigating the continuum from catastrophic forgetting to age-limited learning effects. Frontiers in psychology, 4:504, 2013.
- Accelerating distributed inference of sparse deep neural networks via mitigating the straggler effect. In 2020 IEEE High Performance Extreme Computing Conference (HPEC), pages 1–7, 2020.
- Distributed inference acceleration with adaptive dnn partitioning and offloading. In IEEE INFOCOM 2020 - IEEE Conference on Computer Communications, 2020.
- Efficient test-time model adaptation without forgetting. In Int. Conf. Mach. Lear. PMLR, 2022.
- Towards stable test-time adaptation in dynamic wild world. In Int. Conf. Learn. Represent., 2023.
- Learning to remember: A synaptic plasticity driven framework for continual learning. In IEEE Conf. Comput. Vis. Pattern Recog., 2019.
- First session adaptation: A strong replay-free baseline for class-incremental learning. In Int. Conf. Comput. Vis., 2023.
- Continual lifelong learning with neural networks: A review. Neural Networks, 113, 2019.
- Visda: The visual domain adaptation challenge. arXiv preprint arXiv:1710.06924, 2017.
- Gdumb: A simple approach that questions our progress in continual learning. In Eur. Conf. Comput. Vis., 2020.
- Computationally budgeted continual learning: What does matter? In IEEE Conf. Comput. Vis. Pattern Recog., 2023.
- Dataset shift in machine learning. Mit Press, 2008.
- Learning transferable visual models from natural language supervision. In Int. Conf. Mach. Lear., 2021.
- icarl: Incremental classifier and representation learning. In IEEE Conf. Comput. Vis. Pattern Recog., 2017.
- Imagenet-21k pretraining for the masses, 2021.
- Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016.
- Progress & compress: A scalable framework for continual learning. In Int. Conf. Mach. Lear., 2018.
- Continual learning with deep generative replay. Adv. Neural Inform. Process. Syst., 2017.
- Coda-prompt: Continual decomposed attention-based prompting for rehearsal-free continual learning. In IEEE Conf. Comput. Vis. Pattern Recog., 2023.
- Ecotta: Memory-efficient continual test-time adaptation via self-distilled regularization. In IEEE Conf. Comput. Vis. Pattern Recog., 2023.
- Test-time training with self-supervision for generalization under distribution shifts. In Int. Conf. Mach. Lear., 2020.
- Efficientdet: Scalable and efficient object detection. In IEEE Conf. Comput. Vis. Pattern Recog., 2020.
- Pivot: Prompting for video continual learning. In IEEE Conf. Comput. Vis. Pattern Recog., 2023.
- The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
- Tent: Fully test-time adaptation by entropy minimization. In Int. Conf. Learn. Represent., 2021.
- A comprehensive survey of continual learning: Theory, method and application, 2023.
- Continual test-time domain adaptation. In IEEE Conf. Comput. Vis. Pattern Recog., 2022a.
- S-prompts learning with pre-trained transformers: An occam’s razor for domain incremental learning. Adv. Neural Inform. Process. Syst., 35, 2022b.
- Dualprompt: Complementary prompting for rehearsal-free continual learning. In Eur. Conf. Comput. Vis., 2022c.
- Learning to prompt for continual learning. In IEEE Conf. Comput. Vis. Pattern Recog., 2022d.
- Ross Wightman. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
- Incremental learning using conditional adversarial networks. In Int. Conf. Comput. Vis., 2019.
- Continual learning through synaptic intelligence. In Int. Conf. Mach. Lear., 2017.
- A large-scale study of representation learning with the visual task adaptation benchmark. arXiv preprint arXiv:1910.04867, 2019.
- Slca: Slow learner with classifier alignment for continual learning on a pre-trained model. In Int. Conf. Comput. Vis., 2023.
- Memo: Test time robustness via adaptation and augmentation. In NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications, 2021.
- Benchmarking omni-vision representation through the lens of visual realms. In Eur. Conf. Comput. Vis. Springer, 2022.
- On pitfalls of test-time adaptation, 2023.
- Co-transport for class-incremental learning. In ACM MM, 2021.
- Deep class-incremental learning: A survey. arXiv preprint arXiv:2302.03648, 2023a.
- Revisiting class-incremental learning with pre-trained models: Generalizability and adaptivity are all you need, 2023b.
- Learning to prompt for vision-language models. Int. Conf. Comput. Vis., 2022.
- Imad Eddine Marouf (5 papers)
- Subhankar Roy (51 papers)
- Enzo Tartaglione (68 papers)
- Stéphane Lathuilière (79 papers)