Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Orchestrate Latent Expertise: Advancing Online Continual Learning with Multi-Level Supervision and Reverse Self-Distillation (2404.00417v1)

Published 30 Mar 2024 in cs.LG, cs.AI, and cs.CV

Abstract: To accommodate real-world dynamics, artificial intelligence systems need to cope with sequentially arriving content in an online manner. Beyond regular Continual Learning (CL) attempting to address catastrophic forgetting with offline training of each task, Online Continual Learning (OCL) is a more challenging yet realistic setting that performs CL in a one-pass data stream. Current OCL methods primarily rely on memory replay of old training samples. However, a notable gap from CL to OCL stems from the additional overfitting-underfitting dilemma associated with the use of rehearsal buffers: the inadequate learning of new training samples (underfitting) and the repeated learning of a few old training samples (overfitting). To this end, we introduce a novel approach, Multi-level Online Sequential Experts (MOSE), which cultivates the model as stacked sub-experts, integrating multi-level supervision and reverse self-distillation. Supervision signals across multiple stages facilitate appropriate convergence of the new task while gathering various strengths from experts by knowledge distillation mitigates the performance decline of old tasks. MOSE demonstrates remarkable efficacy in learning new samples and preserving past knowledge through multi-level experts, thereby significantly advancing OCL performance over state-of-the-art baselines (e.g., up to 7.3% on Split CIFAR-100 and 6.1% on Split Tiny-ImageNet).

Orchestrate Latent Expertise: Advancing Online Continual Learning

The paper “Orchestrate Latent Expertise: Advancing Online Continual Learning with Multi-Level Supervision and Reverse Self-Distillation” addresses the critical challenges in Online Continual Learning (OCL), focusing on the overfitting-underfitting dilemma. OCL demands learning from a continuously streaming dataset where each data point is encountered only once. This introduces unique challenges distinct from traditional Continual Learning (CL), such as balancing the adequate learning of new data against preserving knowledge from past data, commonly stored and replayed from memory buffers.

Key Contributions

The paper introduces Multi-level Online Sequential Experts (MOSE), an innovative approach that utilizes multi-level supervision and reverse self-distillation to effectively address the OCL challenges. This method aims to balance the learning of new tasks while preventing the deterioration of performance on old tasks due to overfitting to the stored data.

  1. Multi-level Supervision: MOSE employs a hierarchical supervision mechanism across various network layers, akin to the multi-level processing seen in biological neural networks. Each network layer is treated as a latent expert, tasked with learning representations at varying abstraction levels. This concept draws inspiration from the mammalian visual processing system, which is adept at continual learning in dynamic environments.
  2. Reverse Self-Distillation: To address the challenge of aggregating expertise across these latent layers into a cohesive model, the paper introduces a novel reverse self-distillation process. Instead of distilling knowledge from a single teacher network to a student network, multiple intermediary network layers act as teachers, guiding the final prediction model. This enables the integration of diverse feature representations from each layer into the final output, thereby enhancing the model's overall robustness and adaptability.

Empirical Evaluation

The effectiveness of MOSE is demonstrated through empirical experimentation on popular OCL benchmarks, namely Split CIFAR-100 and Split Tiny-ImageNet. The results are compelling, with MOSE outperforming state-of-the-art methods. Specifically, MOSE delivers up to a 7.3% improvement over competing methodologies on the Split CIFAR-100 dataset and a 6.1% enhancement on the Split Tiny-ImageNet dataset. These results highlight the significant advancement MOSE offers in terms of learning new tasks while maintaining performance on previously learned tasks.

The paper also provides a comprehensive evaluation of the balance between underfitting and overfitting. Through the introduction of Buffer Overfitting Factor (BOF), the paper quantifies the extent of overfitting to buffered memories. MOSE shows superior efficacy in managing this balance, promoting new task learning without succumbing to overfitting on buffered old tasks.

Implications and Future Directions

Practically, the MOSE framework has profound implications for deploying AI systems in real-world scenarios where data arrives in a non-stationary stream and computational resources are limited. The layered supervision and reverse distillation strategies present a more refined mechanism for mitigating catastrophic forgetting, a significant hurdle in lifelong learning applications.

Theoretically, MOSE opens avenues for further exploration into the architecture of neural networks that emulate biological neural processing. It raises intriguing questions about the potential parallels between artificial and biological networks in handling continual learning and adaptability.

Looking forward, this approach could be extended to more sophisticated backbone architectures and adapted for various types of continual learning scenarios beyond image classification. By leveraging different forms of supervision and integrating with other promising learning paradigms, MOSE represents a foundational step toward truly autonomous and adaptive AI systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. SS-IL: separated softmax for incremental learning. In IEEE/CVF International Conference on Computer Vision, pages 824–833, 2021.
  2. Expert gate: Lifelong learning with a network of experts. In IEEE Conference on Computer Vision and Pattern Recognition, pages 7120–7129, 2017.
  3. Online continual learning with maximal interfered retrieval. In Advances in Neural Information Processing Systems, pages 11849–11860, 2019a.
  4. Gradient based sample selection for online continual learning. In Advances in Neural Information Processing Systems, pages 11816–11825, 2019b.
  5. Do deep nets really need to be deep? In Advances in Neural Information Processing Systems, pages 2654–2662, 2014.
  6. Dark experience for general continual learning: a strong, simple baseline. In Advances in Neural Information Processing Systems, 2020.
  7. New insights on reducing abrupt representation change in online continual learning. In International Conference on Learning Representations, 2022.
  8. Co22{}^{\mbox{2}}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTl: Contrastive continual learning. In 2021 IEEE/CVF International Conference on Computer Vision, pages 9496–9505, 2021.
  9. Efficient lifelong learning with A-GEM. In International Conference on Learning Representations, 2019a.
  10. Continual learning with tiny episodic memories, 2019b.
  11. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, pages 1597–1607, 2020.
  12. Contrastive continuity on augmentation stability rehearsal for continual self-supervised learning. In IEEE/CVF International Conference on Computer Vision, pages 5707–5717, 2023.
  13. Online continual learning from imbalanced data. In International Conference on Machine Learning, pages 1952–1961, 2020.
  14. Functional organization of excitatory synaptic strength in primary visual cortex. Nature, 518(7539):399–403, 2015.
  15. Representational drift in the mouse visual cortex. Current Biology, 31(19):4327–4339, 2021.
  16. Robert M French. Catastrophic forgetting in connectionist networks. Trends in Cognitive Sciences, 3(4):128–135, 1999.
  17. Enhancing knowledge transfer for task incremental learning with data-free subnetwork. Advances in Neural Information Processing Systems, 36, 2024.
  18. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211, 2013.
  19. The human visual cortex. Annu. Rev. Neurosci., 27:649–677, 2004.
  20. Not just selection, but exploration: Online class-incremental continual learning via dual view consistency. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7432–7441, 2022.
  21. Online continual learning through mutual information maximization. In International Conference on Machine Learning, pages 8109–8126, 2022.
  22. Dealing with cross-task class discrimination in online continual learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11878–11887, 2023.
  23. Online continual learning for visual food classification. In IEEE/CVF International Conference on Computer Vision Workshops, pages 2337–2346, 2021.
  24. Incremental learning in online scenario. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
  25. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
  26. Overcoming catastrophic interference using conceptor-aided backpropagation. In International Conference on Learning Representations, 2018.
  27. Distilling the knowledge in a neural network. CoRR, abs/1503.02531, 2015.
  28. Learning a unified classifier incrementally via rebalancing. In IEEE Conference on Computer Vision and Pattern Recognition, pages 831–839, 2019.
  29. Distributed representation of objects in the human ventral visual pathway. Proceedings of the National Academy of Sciences, 96(16):9379–9384, 1999.
  30. Gradient-based editing of memory examples for online task-free continual learning. In Advances in Neural Information Processing Systems, pages 29193–29205, 2021.
  31. New insights for the stability-plasticity dilemma in online continual learning. In International Conference on Learning Representations, 2023.
  32. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017.
  33. The ventral visual pathway: an expanded neural framework for the processing of object quality. Trends in Cognitive Sciences, 17(1):26–49, 2013.
  34. Learning multiple layers of features from tiny images. 2009.
  35. A continual learning survey: Defying forgetting in classification tasks. IEEE Trans. Pattern Anal. Mach. Intell., 44(7):3366–3385, 2022.
  36. Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
  37. Learning without forgetting. In European Conference on Computer Vision, pages 614–629, 2016.
  38. PCR: proxy-based contrastive replay for online class-incremental continual learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24246–24255, 2023.
  39. Overcoming recency bias of normalization statistics in continual learning: Balance and adaptation. Advances in Neural Information Processing Systems, 36, 2024.
  40. Supervised contrastive replay: Revisiting the nearest class mean classifier in online class-incremental continual learning. In IEEE Conference on Computer Vision and Pattern Recognition Workshops,, pages 3589–3599, 2021.
  41. Online continual learning in image classification: An empirical survey. Neurocomputing, 469:28–51, 2022.
  42. Class-incremental learning: Survey and performance evaluation on image classification. IEEE Trans. Pattern Anal. Mach. Intell., 45(5):5513–5533, 2023.
  43. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, pages 109–165. Elsevier, 1989.
  44. Distance-based image classification: Generalizing to new classes at near-zero cost. IEEE Trans. Pattern Anal. Mach. Intell., 35(11):2624–2637, 2013.
  45. Essentials for class incremental learning. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 3513–3522, 2021.
  46. Learning to remember: A synaptic plasticity driven framework for continual learning. In IEEE Conference on Computer Vision and Pattern Recognition, pages 11321–11329, 2019.
  47. Continual lifelong learning with neural networks: A review. Neural Networks, 113:54–71, 2019.
  48. Gdumb: A simple approach that questions our progress in continual learning. In European Conference on Computer Vision, pages 524–540, 2020.
  49. icarl: Incremental classifier and representation learning. In IEEE Conference on Computer Vision and Pattern Recognition, pages 5533–5542, 2017.
  50. Overcoming catastrophic forgetting with hard attention to the task. In International Conference on Machine Learning, pages 4548–4557, 2018.
  51. Online class-incremental continual learning with adversarial shapley value. In AAAI Conference on Artificial Intelligence, pages 9630–9638, 2021.
  52. A comprehensive empirical evaluation on online continual learning. In IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pages 3518–3528, 2023.
  53. How deep is the brain? the shallow brain hypothesis. Nature Reviews Neuroscience, 24(12):778–791, 2023.
  54. Adaptive mixtures of local experts are source coding solutions. In International Conference on Neural Networks, pages 1391–1396, 1993.
  55. Frank Tong. Primary visual cortex and visual awareness. Nature Reviews Neuroscience, 4(3):219–229, 2003.
  56. Attention is all you need. In Advances in Neural Information Processing Systems, 2017.
  57. Rehearsal revealed: The limits and merits of revisiting samples in continual learning. In IEEE/CVF International Conference on Computer Vision, pages 9365–9374, 2021.
  58. Jeffrey Scott Vitter. Random sampling with a reservoir. ACM Trans. Math. Softw., 11(1):37–57, 1985.
  59. Continual learning with hypernetworks. In International Conference on Learning Representations, 2020.
  60. Ordisco: Effective and efficient usage of incremental unlabeled data for semi-supervised continual learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5383–5392, 2021a.
  61. Afec: Active forgetting of negative transfer in continual learning. Advances in Neural Information Processing Systems, 34:22379–22391, 2021b.
  62. Coscl: Cooperation of small continual learners is stronger than a big one. In European Conference on Computer Vision, pages 254–271, 2022a.
  63. Memory replay with data compression for continual learning. In International Conference on Learning Representations, 2022b.
  64. Incorporating neuro-inspired adaptability for continual learning in artificial intelligence. Nature Machine Intelligence, 5(12):1356–1368, 2023.
  65. Hierarchical decomposition of prompt-based continual learning: Rethinking obscured sub-optimality. Advances in Neural Information Processing Systems, 36, 2024a.
  66. A comprehensive survey of continual learning: Theory, method and application. IEEE Trans. Pattern Anal. Mach. Intell., 2024b.
  67. Online prototype learning for online continual learning. In IEEE/CVF International Conference on Computer Vision, pages 18764–18774, 2023.
  68. Supermasks in superposition. In Advances in Neural Information Processing Systems, 2020.
  69. Large scale incremental learning. In IEEE Conference on Computer Vision and Pattern Recognition, pages 374–382, 2019.
  70. Artificial neural networks for neuroscientists: A primer. Neuron, 107(6):1048–1070, 2020.
  71. Contrastive correlation preserving replay for online continual learning. IEEE Transactions on Circuits and Systems for Video Technology, pages 1–1, 2023.
  72. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In IEEE/CVF International Conference on Computer Vision, pages 3712–3721, 2019.
  73. A simple but strong baseline for online continual learning: Repeated augmented rehearsal. In Advances in Neural Information Processing Systems, 2022.
  74. Deep class-incremental learning: A survey. CoRR, abs/2302.03648, 2023.
  75. Class-incremental learning via dual augmentation. In Advances in Neural Information Processing Systems, pages 14306–14318, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. HongWei Yan (3 papers)
  2. Liyuan Wang (33 papers)
  3. Kaisheng Ma (46 papers)
  4. Yi Zhong (73 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Youtube Logo Streamline Icon: https://streamlinehq.com