Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reshuffling Resampling Splits Can Improve Generalization of Hyperparameter Optimization (2405.15393v2)

Published 24 May 2024 in stat.ML and cs.LG

Abstract: Hyperparameter optimization is crucial for obtaining peak performance of machine learning models. The standard protocol evaluates various hyperparameter configurations using a resampling estimate of the generalization error to guide optimization and select a final hyperparameter configuration. Without much evidence, paired resampling splits, i.e., either a fixed train-validation split or a fixed cross-validation scheme, are often recommended. We show that, surprisingly, reshuffling the splits for every configuration often improves the final model's generalization performance on unseen data. Our theoretical analysis explains how reshuffling affects the asymptotic behavior of the validation loss surface and provides a bound on the expected regret in the limiting regime. This bound connects the potential benefits of reshuffling to the signal and noise characteristics of the underlying optimization problem. We confirm our theoretical results in a controlled simulation study and demonstrate the practical usefulness of reshuffling in a large-scale, realistic hyperparameter optimization experiment. While reshuffling leads to test performances that are competitive with using fixed splits, it drastically improves results for a single train-validation holdout protocol and can often make holdout become competitive with standard CV while being computationally cheaper.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (83)
  1. A survey of cross-validation procedures for model selection. Statistics Surveys, 4:40 – 79, 2010.
  2. Asymptotics of cross-validation. arXiv:2001.11111 [math.ST], 2020.
  3. DEHB: Evolutionary hyberband for scalable, robust and efficient Hyperparameter Optimization. In Zhou, Z. (ed.), Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI’21), pp.  2147–2153, 2021.
  4. Cross-validation confidence intervals for test error. Advances in Neural Information Processing Systems, 33:16339–16350, 2020.
  5. Don’t waste your time: Early stopping cross-validation. arxiv: 2405.03389 [cs.LG], 2024.
  6. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13:281–305, 2012.
  7. Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, pp.  e1484, 2023.
  8. Beating the hold-out: bounds for k-fold and progressive cross-validation. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory, COLT ’99, pp.  203–208, 1999.
  9. Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems, pp.  1–21, 2022.
  10. Fast classification rates without standard margin assumptions. Information and Inference: A Journal of the IMA, 10(4):1389–1421, 2021.
  11. Accounting for variance in machine learning benchmarks. In Smola, A., Dimakis, A., and Stoica, I. (eds.), Proceedings of Machine Learning and Systems 3, volume 3, pp.  747–769, 2021.
  12. On Overfitting in Model Selection and Subsequent Selection Bias in Performance Evaluation. Journal of Machine Learning Research, 11:2079–2107, 2010.
  13. XGBoost: A scalable tree boosting system. In Krishnapuram, B., Shah, M., Smola, A., Aggarwal, C., Shen, D., and Rastogi, R. (eds.), Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16), pp.  785–794. ACM Press, 2016.
  14. HEBO: Pushing the limits of sample-efficient hyper-parameter optimisation. Journal of Artificial Intelligence Research, 74:1269–1349, 2022.
  15. Demšar, J. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7:1–30, 2006.
  16. Dietterich, T. G. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7):1895–1923, 1998.
  17. A comparison of hyperparameter tuning procedures for clinical prediction models: A simulation study. Statistics in Medicine, 43(6):1119–1134, 2024.
  18. Efficient benchmarking of algorithm configurators via model-based surrogates. Machine Learning, 107(1):15–41, 2018.
  19. Pitfalls and best practices in algorithm configuration. Journal of Artificial Intelligence Research, pp.  861–893, 2019.
  20. HPOBench: A collection of reproducible multi-fidelity benchmark problems for HPO. In Vanschoren & Yeung (2021).
  21. Particle Swarm Model Selection. Journal of Machine Learning Research, 10:405–440, 2009.
  22. Analysing the overfit of the auto-sklearn automated machine learning tool. In Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., and Sciacca, V. (eds.), Machine Learning, Optimization, and Data Science, volume 11943 of Lecture Notes in Computer Science, pp.  508–520, 2019.
  23. BOHB: Robust and efficient Hyperparameter Optimization at scale. In Dy, J. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning (ICML’18), volume 80, pp. 1437–1446. Proceedings of Machine Learning Research, 2018.
  24. Open problem: How fast can a multiclass test set be overfit? In Beygelzimer, A. and Hsu, D. (eds.), Proceedings of the Thirty-Second Conference on Learning Theory, volume 99 of Proceedings of Machine Learning Research, pp.  3185–3189. PMLR, 2019.
  25. Hyperparameter Optimization. In Hutter et al. (2019), chapter 1, pp.  3 – 38. Available for free at http://automl.org/book.
  26. Auto-Sklearn 2.0: Hands-free automl via meta-learning. Journal of Machine Learning Research, 23(261):1–61, 2022.
  27. Garnett, R. Bayesian Optimization. Cambridge University Press, 2023.
  28. Amlb: an automl benchmark. Journal of Machine Learning Research, 25(101):1–65, 2024.
  29. Mathematical Foundations of Infinite-Dimensional Statistical Models, volume 40. Cambridge University Press, 2016.
  30. Why do tree-based models still outperform deep learning on typical tabular data? In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, pp.  507–520, 2022.
  31. Model selection: Beyond the Bayesian/Frequentist divide. Journal of Machine Learning Research, 11:61–87, 2010.
  32. Design of the 2015 ChaLearn AutoML challenge. In 2015 International Joint Conference on Neural Networks (IJCNN’15), pp.  1–8. International Neural Network Society and IEEE Computational Intelligence Society, IEEE, 2015.
  33. Analysis of the AutoML Challenge Series 2015-2018. In Hutter et al. (2019), chapter 10, pp.  177–219. Available for free at http://automl.org/book.
  34. Proceedings of the First International Conference on Automated Machine Learning, 2022. Proceedings of Machine Learning Research.
  35. Completely derandomized self-adaptation in evolution strategies. Evolutionary C., 9(2):159–195, 2001.
  36. The design and analysis of benchmark experiments. Journal of Computational and Graphical Statistics, 14(3):675–699, 2005.
  37. Automated Machine Learning: Methods, Systems, Challenges. Springer, 2019. Available for free at http://automl.org/book.
  38. Igel, C. A note on generalization loss when evolving adaptive pattern recognition systems. IEEE Transactions on Evolutionary Computation, 17(3):345–352, 2012.
  39. Non-stochastic best arm identification and Hyperparameter Optimization. In Gretton, A. and Robert, C. (eds.), Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS’16), volume 51. Proceedings of Machine Learning Research, 2016.
  40. Scaling laws for hyperparameter optimization. In Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S. (eds.), Advances in Neural Information Processing Systems, volume 36, pp.  47527–47553, 2023.
  41. Kallenberg, O. Foundations of modern probability, volume 2. Springer, 1997.
  42. Fast Bayesian optimization of machine learning hyperparameters on large datasets. In Singh, A. and Zhu, J. (eds.), Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS’17), volume 54. Proceedings of Machine Learning Research, 2017.
  43. Optimizing support vector machines for stormwater prediction. Technical Report TR10-2-007, Technische Universität Dortmund, 2010. Proceedings of Workshop on Experimental Methods for the Assessment of Computational Systems joint to PPSN2010.
  44. Towards quantifying the effect of datasets for benchmarking: A look at tabular machine learning. In Data-centric Machine Learning (DMLR) workshop at the International Conference on Learning Representations (ICLR), 2024.
  45. Evaluating models with dynamic sampling holdout in auto-ml. SN Computer Science, 3(506), 2022.
  46. Hyperband: A novel bandit-based approach to Hyperparameter Optimization. Journal of Machine Learning Research, 18(185):1–52, 2018.
  47. SMAC3: A versatile bayesian optimization package for Hyperparameter Optimization. Journal of Machine Learning Research, 23(54):1–9, 2022.
  48. CMA-ES for Hyperparameter Optimization of deep neural networks. In International Conference on Learning Representations Workshop track, 2016. Published online: iclr.cc.
  49. Lévesque, J. Bayesian Hyperparameter Optimization: Overfitting, Ensembles and Conditional Spaces. PhD thesis, Université Laval, 2018.
  50. Automatic termination for hyperparameter optimization. In Guyon et al. (2022).
  51. PriorBand: Practical hyperparameter optimization in the age of deep learning. In Oh et al. (2023).
  52. When do neural nets outperform boosted trees on tabular data? In Oh et al. (2023), pp.  76336–76369.
  53. ML-Plan: Automated machine learning via hierarchical planning. Machine Learning, 107(8-10):1495–1515, 2018.
  54. Prediction error estimation: a comparison of resampling methods. Bioinformatics, 21(15):3301–3307, 2005.
  55. Inference for the generalization error. In Solla, S., Leen, T., and Müller, K. (eds.), Proceedings of the 13th International Conference on Advances in Neural Information Processing Systems (NeurIPS’99). The MIT Press, 1999.
  56. Inference for the generalization error. Machine Learning, 52:239–281, 2003.
  57. Ng, A. Preventing “overfitting”’ of cross-validation data. In Fisher, D. H. (ed.), Proceedings of the Fourteenth International Conference on Machine Learning (ICML’97), pp.  245–253. Morgan Kaufmann Publishers, 1997.
  58. Proceedings of the 36th International Conference on Advances in Neural Information Processing Systems (NeurIPS’23), 2023. Curran Associates.
  59. YAHPO Gym – an efficient multi-objective multi-fidelity benchmark for hyperparameter optimization. In Guyon et al. (2022).
  60. HPO-B: A large-scale reproducible benchmark for black-box HPO based on OpenML. In Vanschoren & Yeung (2021).
  61. Tunability: Importance of hyperparameters of machine learning algorithms. Journal of Machine Learning Research, 20(53):1–32, 2019.
  62. Catboost: Unbiased boosting with categorical features. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Proceedings of the 31st International Conference on Advances in Neural Information Processing Systems (NeurIPS’18), pp.  6639–6649. Curran Associates, 2018.
  63. Cma-es for post hoc ensembling in automl: A great success and salvageable failure. In Faust, A., Garnett, R., White, C., Hutter, F., and Gardner, J. R. (eds.), Proceedings of the Second International Conference on Automated Machine Learning, volume 224 of Proceedings of Machine Learning Research, pp.  1/1–23. PMLR, 2023.
  64. Oversearching and layered search in empirical learning. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, volume 2 of IJCAI’95, pp.  1019–1024, 1995.
  65. On the dangers of cross-validation. an experimental evaluation. In Proceedings of the 2008 SIAM International Conference on Data Mining (SDM), pp.  588–596, 2008.
  66. Syne tune: A library for large scale hyperparameter tuning and reproducible research. In Guyon et al. (2022), pp.  16–1.
  67. Schaffer, C. Selecting a classification method by cross-validation. Machine Learning Journal, 13:135–143, 1993.
  68. Freeze-thaw Bayesian optimization. arXiv:1406.3896 [stats.ML], 2014.
  69. Talagrand, M. The generic chaining: upper and lower bounds of stochastic processes. Springer Science & Business Media, 2005.
  70. Auto-WEKA: combined selection and Hyperparameter Optimization of classification algorithms. In Dhillon, I., Koren, Y., Ghani, R., Senator, T., Bradley, P., Parekh, R., He, J., Grossman, R., and Uthurusamy, R. (eds.), The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13), pp.  847–855. ACM Press, 2013.
  71. Bayesian optimization is superior to random search for machine learning hyperparameter tuning: Analysis of the Black-Box Optimization Challenge 2020. In Escalante, H. and Hofmann, K. (eds.), Proceedings of the Neural Information Processing Systems Track Competition and Demonstration, pp.  3–26. Curran Associates, 2021.
  72. van der Vaart, A. Asymptotic statistics, volume 3. Cambridge university press, 2000.
  73. Fast rates in statistical and online learning. Journal of Machine Learning Research, 16(54):1793–1861, 2015.
  74. Hyperparameter importance across datasets. In Guo, Y. and Farooq, F. (eds.), Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’18), pp.  2367–2376. ACM Press, 2018.
  75. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021. Curran Associates.
  76. OpenML: Networked science in machine learning. SIGKDD Explorations, 15(2):49–60, 2014.
  77. Empirical Evaluation of Resampling Procedures for Optimising SVM Hyperparameters. Journal of Machine Learning Research, 18:1–35, 2017.
  78. Wainwright, M. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019.
  79. Scalable Gaussian process-based transfer surrogates for Hyperparameter Optimization. Machine Learning, 107(1):43–78, 2018.
  80. Practical multi-fidelity Bayesian optimization for hyperparameter tuning. In Peters, J. and Sontag, D. (eds.), Proceedings of The 36th Uncertainty in Artificial Intelligence Conference (UAI’20), pp. 788–798. PMLR, 2020.
  81. Lazy paired hyper-parameter tuning. In Rossi, F. (ed.), Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI’13), pp.  1924–1931, 2013.
  82. Auto-Pytorch: Multi-fidelity metalearning for efficient and robust AutoDL. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43:3079–3090, 2021.
  83. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2):301–320, 2005.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com