Hyperparameter Selection in Continual Learning (2404.06466v2)
Abstract: In continual learning (CL) -- where a learner trains on a stream of data -- standard hyperparameter optimisation (HPO) cannot be applied, as a learner does not have access to all of the data at the same time. This has prompted the development of CL-specific HPO frameworks. The most popular way to tune hyperparameters in CL is to repeatedly train over the whole data stream with different hyperparameter settings. However, this end-of-training HPO is unusable in practice since a learner can only see the stream once. Hence, there is an open question: what HPO framework should a practitioner use for a CL problem in reality? This paper looks at this question by comparing several realistic HPO frameworks. We find that none of the HPO frameworks considered, including end-of-training HPO, perform consistently better than the rest on popular CL benchmarks. We therefore arrive at a twofold conclusion: a) to be able to discriminate between HPO frameworks there is a need to move beyond the current most commonly used CL benchmarks, and b) on the popular CL benchmarks examined, a CL practitioner should use a realistic HPO framework and can select it based on factors separate from performance, for example compute efficiency.
- Online Continual Learning with Maximal Interfered Retrieval. In Proceedings of the 33rd Conference on the Advances in Neural Information Processing Systems, pp. 11849–11860, 2019a.
- Gradient Based Sample Selection for Online Continual Learning. In Proceedings of the 33rd Conference on the Advances in Neural Information Processing Systems, pp. 11816–11825, 2019b.
- Defining Benchmarks for Continual Few-shot Learning. arXiv preprint arXiv:2004.11967, 2020.
- Algorithms for Hyper-Parameter Optimization. In Proceeding of the 25th Conference on the Advances in Neural Information Processing Systems, 2011.
- Class-Incremental Continual Learning into the Extended DER-verse. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):5497–5512, 2022.
- Dark Experience for General Continual Learning: a Strong, Simple Baseline. In Proceedings of the 33rd Conference on the Advances in Neural Information Processing Systems, pp. 15920–15930, 2020.
- New Insights on Reducing Abrupt Representation Change in Online Continual Learning. In Proceedings of the 10th International Conference on Learning Representations, 2021.
- Efficient Lifelong Learning with A-GEM. In Proceedings of the 7th International Conference on Learning Representations, 2019a.
- On Tiny Episodic Memories in Continual Learning. arXiv preprint arXiv:1902.10486, 2019b.
- Continual Learning in Low-rank Orthogonal Subspaces. In Proceeding of the 34th Conference on the Advances in Neural Information Processing Systems, pp. 9900–9911, 2020.
- A Continual Learning Survey: Defying Forgetting in Classification Tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7):3366–3385, 2021.
- Hyperparameter Optimization. Automated machine learning: Methods, systems, challenges, pp. 3–33, 2019.
- Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
- Obeying the Order: Introducing Ordered Transfer Hyperparameter Optimisation. arXiv preprint arXiv:2306.16916, 2023.
- Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines. In Proceedings of the 3rd Continual Learning Workshop, at the 32nd Conference on the Advances in Neural Information Processing Systems, 2018.
- What can AutoML do for Continual Learning? arXiv preprint arXiv:2311.11963, 2023.
- Alex Krizhevsky. Learning Multiple Layers of Features from Tiny Images. Preprint, 2009.
- Chunking: Forgetting Matters in Continual Learning even without Changing Tasks. arXiv preprint arXiv:2310.02206, 2023.
- Approximate Bayesian Class-Conditional Models under Continuous Representation Shift. In Proceedings of the 27th International Conference on Artificial Intelligence and Statistics, 2024.
- Online hyperparameter optimization for class-incremental learning. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, pp. 8906–8913, 2023.
- Understanding the Role of Training Regimes in Continual Learning. In Proceedings of the 33rd conference on the Advances in Neural Information Processing Systems, pp. 7308–7320, 2020.
- Continual Lifelong Learning with Neural Networks: A review. Neural Networks, 113:54 – 71, 2019.
- GDumb: A Simple Approach that Questions our Progress in Continual Learning. In Procceding of the 16th European Conference on Computer Vision, pp. 524–540, 2020.
- ICARL: Incremental Classifier and Representation Learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 2001–2010, 2017.
- Error Sensitivity Modulation based Experience Replay: Mitigating Abrupt Representation Drift in Continual Learning. In Proceedings of the Eleventh International Conference on Learning Representations, 2023.
- Adaptive Hyperparameter Optimization for Continual Learning Scenarios. arXiv preprint arXiv:2403.07015, 2024.
- Practical Bayesian Optimization of Machine Learning Algorithms. In Proceeding of the 26th Conference on the Advances in Neural Information Processing Systems, 2012.
- Gido M van de Ven and Andreas S Tolias. Three Scenarios for Continual Learning. arXiv preprint arXiv:1904.07734, 2019.
- A Comprehensive Survey of Continual Learning: Theory, Method and Application. arXiv preprint arXiv:2302.00487, 2023.
- Renate: A Library for Real-World Continual Learning. arXiv preprint arXiv:2304.12067, 2023.
- Tiny Imagenet Challenge (cs231n), http://tiny-imagenet.herokuapp.com/. Technical report, Stanford, 2015.
- Pretrained Language Models in Continual Learning: A Comparative Study. In Proceedings of the 10th International Conference on Learning Representations, 2022.