Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
120 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MGSER-SAM: Memory-Guided Soft Experience Replay with Sharpness-Aware Optimization for Enhanced Continual Learning (2405.09492v1)

Published 15 May 2024 in cs.LG

Abstract: Deep neural networks suffer from the catastrophic forgetting problem in the field of continual learning (CL). To address this challenge, we propose MGSER-SAM, a novel memory replay-based algorithm specifically engineered to enhance the generalization capabilities of CL models. We first intergrate the SAM optimizer, a component designed for optimizing flatness, which seamlessly fits into well-known Experience Replay frameworks such as ER and DER++. Then, MGSER-SAM distinctively addresses the complex challenge of reconciling conflicts in weight perturbation directions between ongoing tasks and previously stored memories, which is underexplored in the SAM optimizer. This is effectively accomplished by the strategic integration of soft logits and the alignment of memory gradient directions, where the regularization terms facilitate the concurrent minimization of various training loss terms integral to the CL process. Through rigorous experimental analysis conducted across multiple benchmarks, MGSER-SAM has demonstrated a consistent ability to outperform existing baselines in all three CL scenarios. Comparing to the representative memory replay-based baselines ER and DER++, MGSER-SAM not only improves the testing accuracy by $24.4\%$ and $17.6\%$ respectively, but also achieves the lowest forgetting on each benchmark.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, pp. 1097–1105, 2012.
  2. “Lomar: A local defense against poisoning attack on federated learning,” IEEE Transactions on Dependable and Secure Computing, 2021.
  3. “Fedlga: Toward system-heterogeneity of federated learning via local gradient approximation,” IEEE Transactions on Cybernetics, 2023.
  4. “An empirical investigation of catastrophic forgetting in gradient-based neural networks,” arXiv preprint arXiv:1312.6211, 2013.
  5. “Overcoming catastrophic forgetting in neural networks,” Proceedings of the national academy of sciences, vol. 114, no. 13, pp. 3521–3526, 2017.
  6. “Continual learning through synaptic intelligence,” in International Conference on Machine Learning. PMLR, 2017, pp. 3987–3995.
  7. “Progressive neural networks,” arXiv preprint arXiv:1606.04671, 2016.
  8. “Gradient episodic memory for continual learning,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp. 6470–6479.
  9. “Experience replay for continual learning,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  10. Gido M Van de Ven and Andreas S Tolias, “Three scenarios for continual learning,” arXiv preprint arXiv:1904.07734, 2019.
  11. Jeffrey S Vitter, “Random sampling with a reservoir,” ACM Transactions on Mathematical Software (TOMS), vol. 11, no. 1, pp. 37–57, 1985.
  12. “On large-batch training for deep learning: Generalization gap and sharp minima,” in International Conference on Learning Representations, 2017.
  13. “Entropy-sgd: Biasing gradient descent into wide valleys,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2019, no. 12, pp. 124018, 2019.
  14. “Generalized federated learning via sharpness aware minimization,” in International Conference on Machine Learning. PMLR, 2022, pp. 18250–18280.
  15. “Sharpness-aware minimization for efficiently improving generalization,” in International Conference on Learning Representations, 2020.
  16. “Dark experience for general continual learning: a strong, simple baseline,” Advances in neural information processing systems, vol. 33, pp. 15920–15930, 2020.
  17. “Continual lifelong learning with neural networks: A review,” Neural Networks, vol. 113, pp. 54–71, 2019.
  18. “Learning without forgetting,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 12, pp. 2935–2947, 2017.
  19. “Continual learning via inter-task synaptic mapping,” Knowledge-Based Systems, vol. 222, pp. 106947, 2021.
  20. “Exemplar-free class incremental learning via discriminative and comparable parallel one-class classifiers,” Pattern Recognition, vol. 140, pp. 109561, 2023.
  21. Yi Yao and Gianfranco Doretto, “Boosting for transfer learning with multiple sources,” in 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, 2010, pp. 1855–1862.
  22. “Lifelong learning with dynamically expandable networks,” in International Conference on Learning Representations, 2018.
  23. “Continual learning with deep generative replay,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp. 2994–3003.
  24. “Efficient lifelong learning with a-GEM,” in International Conference on Learning Representations, 2019.
  25. “Online continual learning with maximal interfered retrieval,” in Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds., pp. 11849–11860. Curran Associates, Inc., 2019.
  26. “Focl: Feature-oriented continual learning for generative models,” Pattern Recognition, vol. 120, pp. 108127, 2021.
  27. “Meta-learning for dynamic tuning of active learning on stream classification,” Pattern Recognition, vol. 138, pp. 109359, 2023.
  28. “Sharpness and gradient aware minimization for memory-based continual learning,” in Proceedings of the 12th International Symposium on Information and Communication Technology, 2023, pp. 189–196.
  29. “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
  30. “How to prevent the poor performance clients for personalized federated learning?,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 12167–12176.
  31. “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  32. “Learning multiple layers of features from tiny images,” Tech. Rep. 0, University of Toronto, Toronto, Ontario, 2009.
  33. Ya Le and Xuan S. Yang, “Tiny imagenet visual recognition challenge,” 2015.
  34. “Class-incremental continual learning into the extended der-verse,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  35. “Progress & compress: A scalable framework for continual learning,” in International Conference on Machine Learning. PMLR, 2018, pp. 4528–4537.
  36. “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets