Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
103 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
50 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Hyperparameter Selection in Continual Learning (2404.06466v2)

Published 9 Apr 2024 in cs.LG and stat.ML

Abstract: In continual learning (CL) -- where a learner trains on a stream of data -- standard hyperparameter optimisation (HPO) cannot be applied, as a learner does not have access to all of the data at the same time. This has prompted the development of CL-specific HPO frameworks. The most popular way to tune hyperparameters in CL is to repeatedly train over the whole data stream with different hyperparameter settings. However, this end-of-training HPO is unusable in practice since a learner can only see the stream once. Hence, there is an open question: what HPO framework should a practitioner use for a CL problem in reality? This paper looks at this question by comparing several realistic HPO frameworks. We find that none of the HPO frameworks considered, including end-of-training HPO, perform consistently better than the rest on popular CL benchmarks. We therefore arrive at a twofold conclusion: a) to be able to discriminate between HPO frameworks there is a need to move beyond the current most commonly used CL benchmarks, and b) on the popular CL benchmarks examined, a CL practitioner should use a realistic HPO framework and can select it based on factors separate from performance, for example compute efficiency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Online Continual Learning with Maximal Interfered Retrieval. In Proceedings of the 33rd Conference on the Advances in Neural Information Processing Systems, pp.  11849–11860, 2019a.
  2. Gradient Based Sample Selection for Online Continual Learning. In Proceedings of the 33rd Conference on the Advances in Neural Information Processing Systems, pp.  11816–11825, 2019b.
  3. Defining Benchmarks for Continual Few-shot Learning. arXiv preprint arXiv:2004.11967, 2020.
  4. Algorithms for Hyper-Parameter Optimization. In Proceeding of the 25th Conference on the Advances in Neural Information Processing Systems, 2011.
  5. Class-Incremental Continual Learning into the Extended DER-verse. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):5497–5512, 2022.
  6. Dark Experience for General Continual Learning: a Strong, Simple Baseline. In Proceedings of the 33rd Conference on the Advances in Neural Information Processing Systems, pp.  15920–15930, 2020.
  7. New Insights on Reducing Abrupt Representation Change in Online Continual Learning. In Proceedings of the 10th International Conference on Learning Representations, 2021.
  8. Efficient Lifelong Learning with A-GEM. In Proceedings of the 7th International Conference on Learning Representations, 2019a.
  9. On Tiny Episodic Memories in Continual Learning. arXiv preprint arXiv:1902.10486, 2019b.
  10. Continual Learning in Low-rank Orthogonal Subspaces. In Proceeding of the 34th Conference on the Advances in Neural Information Processing Systems, pp.  9900–9911, 2020.
  11. A Continual Learning Survey: Defying Forgetting in Classification Tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7):3366–3385, 2021.
  12. Hyperparameter Optimization. Automated machine learning: Methods, systems, challenges, pp.  3–33, 2019.
  13. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  770–778, 2016.
  14. Obeying the Order: Introducing Ordered Transfer Hyperparameter Optimisation. arXiv preprint arXiv:2306.16916, 2023.
  15. Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines. In Proceedings of the 3rd Continual Learning Workshop, at the 32nd Conference on the Advances in Neural Information Processing Systems, 2018.
  16. What can AutoML do for Continual Learning? arXiv preprint arXiv:2311.11963, 2023.
  17. Alex Krizhevsky. Learning Multiple Layers of Features from Tiny Images. Preprint, 2009.
  18. Chunking: Forgetting Matters in Continual Learning even without Changing Tasks. arXiv preprint arXiv:2310.02206, 2023.
  19. Approximate Bayesian Class-Conditional Models under Continuous Representation Shift. In Proceedings of the 27th International Conference on Artificial Intelligence and Statistics, 2024.
  20. Online hyperparameter optimization for class-incremental learning. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, pp.  8906–8913, 2023.
  21. Understanding the Role of Training Regimes in Continual Learning. In Proceedings of the 33rd conference on the Advances in Neural Information Processing Systems, pp.  7308–7320, 2020.
  22. Continual Lifelong Learning with Neural Networks: A review. Neural Networks, 113:54 – 71, 2019.
  23. GDumb: A Simple Approach that Questions our Progress in Continual Learning. In Procceding of the 16th European Conference on Computer Vision, pp.  524–540, 2020.
  24. ICARL: Incremental Classifier and Representation Learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp.  2001–2010, 2017.
  25. Error Sensitivity Modulation based Experience Replay: Mitigating Abrupt Representation Drift in Continual Learning. In Proceedings of the Eleventh International Conference on Learning Representations, 2023.
  26. Adaptive Hyperparameter Optimization for Continual Learning Scenarios. arXiv preprint arXiv:2403.07015, 2024.
  27. Practical Bayesian Optimization of Machine Learning Algorithms. In Proceeding of the 26th Conference on the Advances in Neural Information Processing Systems, 2012.
  28. Gido M van de Ven and Andreas S Tolias. Three Scenarios for Continual Learning. arXiv preprint arXiv:1904.07734, 2019.
  29. A Comprehensive Survey of Continual Learning: Theory, Method and Application. arXiv preprint arXiv:2302.00487, 2023.
  30. Renate: A Library for Real-World Continual Learning. arXiv preprint arXiv:2304.12067, 2023.
  31. Tiny Imagenet Challenge (cs231n), http://tiny-imagenet.herokuapp.com/. Technical report, Stanford, 2015.
  32. Pretrained Language Models in Continual Learning: A Comparative Study. In Proceedings of the 10th International Conference on Learning Representations, 2022.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com