Orchestrate Latent Expertise: Advancing Online Continual Learning with Multi-Level Supervision and Reverse Self-Distillation (2404.00417v1)
Abstract: To accommodate real-world dynamics, artificial intelligence systems need to cope with sequentially arriving content in an online manner. Beyond regular Continual Learning (CL) attempting to address catastrophic forgetting with offline training of each task, Online Continual Learning (OCL) is a more challenging yet realistic setting that performs CL in a one-pass data stream. Current OCL methods primarily rely on memory replay of old training samples. However, a notable gap from CL to OCL stems from the additional overfitting-underfitting dilemma associated with the use of rehearsal buffers: the inadequate learning of new training samples (underfitting) and the repeated learning of a few old training samples (overfitting). To this end, we introduce a novel approach, Multi-level Online Sequential Experts (MOSE), which cultivates the model as stacked sub-experts, integrating multi-level supervision and reverse self-distillation. Supervision signals across multiple stages facilitate appropriate convergence of the new task while gathering various strengths from experts by knowledge distillation mitigates the performance decline of old tasks. MOSE demonstrates remarkable efficacy in learning new samples and preserving past knowledge through multi-level experts, thereby significantly advancing OCL performance over state-of-the-art baselines (e.g., up to 7.3% on Split CIFAR-100 and 6.1% on Split Tiny-ImageNet).
- SS-IL: separated softmax for incremental learning. In IEEE/CVF International Conference on Computer Vision, pages 824–833, 2021.
- Expert gate: Lifelong learning with a network of experts. In IEEE Conference on Computer Vision and Pattern Recognition, pages 7120–7129, 2017.
- Online continual learning with maximal interfered retrieval. In Advances in Neural Information Processing Systems, pages 11849–11860, 2019a.
- Gradient based sample selection for online continual learning. In Advances in Neural Information Processing Systems, pages 11816–11825, 2019b.
- Do deep nets really need to be deep? In Advances in Neural Information Processing Systems, pages 2654–2662, 2014.
- Dark experience for general continual learning: a strong, simple baseline. In Advances in Neural Information Processing Systems, 2020.
- New insights on reducing abrupt representation change in online continual learning. In International Conference on Learning Representations, 2022.
- Co22{}^{\mbox{2}}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTl: Contrastive continual learning. In 2021 IEEE/CVF International Conference on Computer Vision, pages 9496–9505, 2021.
- Efficient lifelong learning with A-GEM. In International Conference on Learning Representations, 2019a.
- Continual learning with tiny episodic memories, 2019b.
- A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, pages 1597–1607, 2020.
- Contrastive continuity on augmentation stability rehearsal for continual self-supervised learning. In IEEE/CVF International Conference on Computer Vision, pages 5707–5717, 2023.
- Online continual learning from imbalanced data. In International Conference on Machine Learning, pages 1952–1961, 2020.
- Functional organization of excitatory synaptic strength in primary visual cortex. Nature, 518(7539):399–403, 2015.
- Representational drift in the mouse visual cortex. Current Biology, 31(19):4327–4339, 2021.
- Robert M French. Catastrophic forgetting in connectionist networks. Trends in Cognitive Sciences, 3(4):128–135, 1999.
- Enhancing knowledge transfer for task incremental learning with data-free subnetwork. Advances in Neural Information Processing Systems, 36, 2024.
- An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211, 2013.
- The human visual cortex. Annu. Rev. Neurosci., 27:649–677, 2004.
- Not just selection, but exploration: Online class-incremental continual learning via dual view consistency. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7432–7441, 2022.
- Online continual learning through mutual information maximization. In International Conference on Machine Learning, pages 8109–8126, 2022.
- Dealing with cross-task class discrimination in online continual learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11878–11887, 2023.
- Online continual learning for visual food classification. In IEEE/CVF International Conference on Computer Vision Workshops, pages 2337–2346, 2021.
- Incremental learning in online scenario. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
- Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
- Overcoming catastrophic interference using conceptor-aided backpropagation. In International Conference on Learning Representations, 2018.
- Distilling the knowledge in a neural network. CoRR, abs/1503.02531, 2015.
- Learning a unified classifier incrementally via rebalancing. In IEEE Conference on Computer Vision and Pattern Recognition, pages 831–839, 2019.
- Distributed representation of objects in the human ventral visual pathway. Proceedings of the National Academy of Sciences, 96(16):9379–9384, 1999.
- Gradient-based editing of memory examples for online task-free continual learning. In Advances in Neural Information Processing Systems, pages 29193–29205, 2021.
- New insights for the stability-plasticity dilemma in online continual learning. In International Conference on Learning Representations, 2023.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017.
- The ventral visual pathway: an expanded neural framework for the processing of object quality. Trends in Cognitive Sciences, 17(1):26–49, 2013.
- Learning multiple layers of features from tiny images. 2009.
- A continual learning survey: Defying forgetting in classification tasks. IEEE Trans. Pattern Anal. Mach. Intell., 44(7):3366–3385, 2022.
- Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
- Learning without forgetting. In European Conference on Computer Vision, pages 614–629, 2016.
- PCR: proxy-based contrastive replay for online class-incremental continual learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24246–24255, 2023.
- Overcoming recency bias of normalization statistics in continual learning: Balance and adaptation. Advances in Neural Information Processing Systems, 36, 2024.
- Supervised contrastive replay: Revisiting the nearest class mean classifier in online class-incremental continual learning. In IEEE Conference on Computer Vision and Pattern Recognition Workshops,, pages 3589–3599, 2021.
- Online continual learning in image classification: An empirical survey. Neurocomputing, 469:28–51, 2022.
- Class-incremental learning: Survey and performance evaluation on image classification. IEEE Trans. Pattern Anal. Mach. Intell., 45(5):5513–5533, 2023.
- Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, pages 109–165. Elsevier, 1989.
- Distance-based image classification: Generalizing to new classes at near-zero cost. IEEE Trans. Pattern Anal. Mach. Intell., 35(11):2624–2637, 2013.
- Essentials for class incremental learning. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 3513–3522, 2021.
- Learning to remember: A synaptic plasticity driven framework for continual learning. In IEEE Conference on Computer Vision and Pattern Recognition, pages 11321–11329, 2019.
- Continual lifelong learning with neural networks: A review. Neural Networks, 113:54–71, 2019.
- Gdumb: A simple approach that questions our progress in continual learning. In European Conference on Computer Vision, pages 524–540, 2020.
- icarl: Incremental classifier and representation learning. In IEEE Conference on Computer Vision and Pattern Recognition, pages 5533–5542, 2017.
- Overcoming catastrophic forgetting with hard attention to the task. In International Conference on Machine Learning, pages 4548–4557, 2018.
- Online class-incremental continual learning with adversarial shapley value. In AAAI Conference on Artificial Intelligence, pages 9630–9638, 2021.
- A comprehensive empirical evaluation on online continual learning. In IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pages 3518–3528, 2023.
- How deep is the brain? the shallow brain hypothesis. Nature Reviews Neuroscience, 24(12):778–791, 2023.
- Adaptive mixtures of local experts are source coding solutions. In International Conference on Neural Networks, pages 1391–1396, 1993.
- Frank Tong. Primary visual cortex and visual awareness. Nature Reviews Neuroscience, 4(3):219–229, 2003.
- Attention is all you need. In Advances in Neural Information Processing Systems, 2017.
- Rehearsal revealed: The limits and merits of revisiting samples in continual learning. In IEEE/CVF International Conference on Computer Vision, pages 9365–9374, 2021.
- Jeffrey Scott Vitter. Random sampling with a reservoir. ACM Trans. Math. Softw., 11(1):37–57, 1985.
- Continual learning with hypernetworks. In International Conference on Learning Representations, 2020.
- Ordisco: Effective and efficient usage of incremental unlabeled data for semi-supervised continual learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5383–5392, 2021a.
- Afec: Active forgetting of negative transfer in continual learning. Advances in Neural Information Processing Systems, 34:22379–22391, 2021b.
- Coscl: Cooperation of small continual learners is stronger than a big one. In European Conference on Computer Vision, pages 254–271, 2022a.
- Memory replay with data compression for continual learning. In International Conference on Learning Representations, 2022b.
- Incorporating neuro-inspired adaptability for continual learning in artificial intelligence. Nature Machine Intelligence, 5(12):1356–1368, 2023.
- Hierarchical decomposition of prompt-based continual learning: Rethinking obscured sub-optimality. Advances in Neural Information Processing Systems, 36, 2024a.
- A comprehensive survey of continual learning: Theory, method and application. IEEE Trans. Pattern Anal. Mach. Intell., 2024b.
- Online prototype learning for online continual learning. In IEEE/CVF International Conference on Computer Vision, pages 18764–18774, 2023.
- Supermasks in superposition. In Advances in Neural Information Processing Systems, 2020.
- Large scale incremental learning. In IEEE Conference on Computer Vision and Pattern Recognition, pages 374–382, 2019.
- Artificial neural networks for neuroscientists: A primer. Neuron, 107(6):1048–1070, 2020.
- Contrastive correlation preserving replay for online continual learning. IEEE Transactions on Circuits and Systems for Video Technology, pages 1–1, 2023.
- Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In IEEE/CVF International Conference on Computer Vision, pages 3712–3721, 2019.
- A simple but strong baseline for online continual learning: Repeated augmented rehearsal. In Advances in Neural Information Processing Systems, 2022.
- Deep class-incremental learning: A survey. CoRR, abs/2302.03648, 2023.
- Class-incremental learning via dual augmentation. In Advances in Neural Information Processing Systems, pages 14306–14318, 2021.
- HongWei Yan (3 papers)
- Liyuan Wang (33 papers)
- Kaisheng Ma (46 papers)
- Yi Zhong (73 papers)