Papers
Topics
Authors
Recent
2000 character limit reached

On the Hyperparameter Loss Landscapes of Machine Learning Models: An Exploratory Study

Published 23 Nov 2023 in cs.LG | (2311.14014v2)

Abstract: Previous efforts on hyperparameter optimization (HPO) of ML models predominately focus on algorithmic advances, yet little is known about the topography of the underlying hyperparameter (HP) loss landscape, which plays a fundamental role in governing the search process of HPO. While several works have conducted fitness landscape analysis (FLA) on various ML systems, they are limited to properties of isolated landscape without interrogating the potential structural similarities among them. The exploration of such similarities can provide a novel perspective for understanding the mechanism behind modern HPO methods, but has been missing, possibly due to the expensive cost of large-scale landscape construction, and the lack of effective analysis methods. In this paper, we mapped 1,500 HP loss landscapes of 6 representative ML models on 63 datasets across different fidelity levels, with 11M+ configurations. By conducting exploratory analysis on these landscapes with fine-grained visualizations and dedicated FLA metrics, we observed a similar landscape topography across a wide range of models, datasets, and fidelities, and shed light on several central topics in HPO.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (108)
  1. Optuna: A next-generation hyperparameter optimization framework. In KDD’19 : Proc. of the 25th ACM SIGKDD International Conference on Knowledge Discovery, pp.  2623–2631. ACM, 2019.
  2. DEHB: evolutionary hyberband for scalable, robust and efficient hyperparameter optimization. In IJCAI’21: Proc. of the Thirtieth International Joint Conference on Artificial Intelligence, pp.  2147–2153. ijcai.org, 2021.
  3. Jahs-bench-201: A foundation for research on joint architecture and hyperparameter search. In NeurIPS, 2022.
  4. Collaborative hyperparameter tuning. In ICML’13: Proc. of the 30th International Conference on Machine Learning, volume 28 of JMLR Workshop and Conference Proceedings, pp.  199–207. JMLR.org, 2013.
  5. Overfitting or perfect fitting? risk bounds for classification and regression rules that interpolate. In NIPS’18: Proc. of Advances in Neural Information Processing Systems, pp.  2306–2317, 2018.
  6. Random search for hyper-parameter optimization. J. Mach. Learn. Res., 13:281–305, 2012.
  7. Algorithms for hyper-parameter optimization. In NIPS’11: Proc. of the 25th Annual Conference on Neural Information Processing Systems, pp.  2546–2554, 2011.
  8. CAVE: configuration assessment, visualization and evaluation. In LION’18: Proc. of the 12th International Conference, volume 11353, pp.  115–130. Springer, 2018.
  9. Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. WIREs Data. Mining. Knowl. Discov., 13(2), 2023.
  10. Leo Breiman. Random forests. Mach. Learn., 45(1):5–32, 2001.
  11. Taking a walk on a landscape. Science, 293(5530):612–613, 2001.
  12. Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. In NIPS’00: Proc. of Advances in Neural Information Processing Systems, pp.  402–408. MIT Press, 2000.
  13. Gavin C Cawley. Model selection for support vector machines via adaptive step-size tabu search. In Proc. of the International Conference in Artificial Neural Nets and Genetic Algorithms, pp.  434–437. Springer, 2001.
  14. Xgboost: A scalable tree boosting system. In SIGKDD’16: Proc. of the 22nd ACM International Conference on Knowledge Discovery and Data Mining, pp.  785–794. ACM, 2016.
  15. Neural architecture search benchmarks: Insights and survey. IEEE Access, 11:25217–25236, 2023.
  16. Empirical fitness landscapes and the predictability of evolution. Nat. Rev. Genet., 15(7):480–490, 2014.
  17. Actup: Analyzing and consolidating tsne and UMAP. In IJCAI’23: Proc. of the 32nd International Joint Conference on Artificial Intelligence, pp.  3651–3658. ijcai.org, 2023.
  18. Trust in automl: exploring information needs for establishing trust in automated machine learning systems. In IUI’20: Proc. of the 25th International Conference on Intelligent User Interfaces, pp.  297–307. ACM, 2020.
  19. Explainable AI (XAI): core ideas, techniques, and solutions. ACM Comput. Surv., 55(9):194:1–194:33, 2023.
  20. Hpobench: A collection of reproducible multi-fidelity benchmark problems for HPO. In NeurIPS’21: Proc. of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021.
  21. Hyperparameters in reinforcement learning and how to tune them. In ICML’23: Proc. of the International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  9104–9149. PMLR, 2023.
  22. BOHB: robust and efficient hyperparameter optimization at scale. In ICML’18: Proc. of the 35th International Conference on Machine Learning, volume 80, pp.  1436–1445. PMLR, 2018.
  23. Efficient and robust automated machine learning. In NIPS’15: Proc. of Advances in Neural Information Processing Systems, pp.  2962–2970, 2015a.
  24. Initializing bayesian hyperparameter optimization via meta-learning. In AAAI’15: Proc. of the 29th Conference on Artificial Intelligence, pp.  1128–1135. AAAI Press, 2015b.
  25. Jerome H. Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pp.  1189–1232, 2001.
  26. Evolutionary tuning of multiple SVM parameters. Neurocomputing, 64:107–117, 2005.
  27. Why do tree-based models still outperform deep learning on typical tabular data? In NeurIPS, 2022.
  28. A novel ls-svms hyper-parameter selection based on particle swarm optimization. Neurocomputing, 71(16-18):3211–3215, 2008.
  29. William L. Hamilton. Graph Representation Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2020.
  30. Xionglei He and Li Liu. Toward a prospective molecular evolution. Science, 352(6287):769–770, 2016.
  31. Mingyu Huang and Ke Li. Exploring structural similarity in fitness landscapes via graph data mining: A case study on number partitioning problems. In IJCAI’23: Proc. of the 32nd International Joint Conference on Artificial Intelligence, pp.  5595–5603. ijcai.org, 2023.
  32. Sequential model-based optimization for general algorithm configuration. In LION’11: Proc. of the 5th International Conference on Learning and Intelligent Optimization, volume 6683, pp.  507–523. Springer, 2011.
  33. An efficient approach for assessing hyperparameter importance. In ICML’14: Proc. of the 31th International Conference on Machine Learning, volume 32, pp.  754–762. JMLR.org, 2014a.
  34. Algorithm runtime prediction: Methods & evaluation. Artif. Intell., 206:79–111, 2014b.
  35. Automated Machine Learning - Methods, Systems, Challenges. Springer, 2019. ISBN 978-3-030-05317-8.
  36. Do we need zero training loss after achieving zero training error? In ICML’20: Proc. of the 37th International Conference on Machine Learning, volume 119, pp.  4604–4614. PMLR, 2020.
  37. Gaussian process bandit optimisation with multi-fidelity evaluations. In NIPS’16: Proc. of Advances in Neural Information Processing Systems, pp.  992–1000, 2016.
  38. Multi-fidelity bayesian optimisation with continuous approximations. In ICML’17: Proc. of the 34th International Conference on Machine Learning, volume 70, pp.  1799–1808. PMLR, 2017.
  39. Almost optimal exploration in multi-armed bandits. In ICML’13: Proc. of the 30th International Conference on Machine Learning, volume 28, pp.  1238–1246. JMLR.org, 2013.
  40. Lightgbm: A highly efficient gradient boosting decision tree. In NIPS’17: Proc. of Advances in Neural Information Processing Systems, pp.  3146–3154, 2017.
  41. Automated algorithm selection: Survey and perspectives. Evol. Comput., 27(1):3–45, 2019.
  42. Learning to warm-start bayesian hyperparameter optimization. CoRR, abs/1710.06219, 2017.
  43. Tabular benchmarks for joint architecture and hyperparameter optimization. CoRR, abs/1905.04970, 2019.
  44. Imagenet classification with deep convolutional neural networks. In NIPS’12: Proc. of Advances in Neural Information Processing Systems 25, pp.  1106–1114, 2012.
  45. Optimizing hyperparameters of support vector machines by genetic algorithms. In ICAI’05: Proc. of the 2005 International Conference on Artificial Intelligence, pp.  74–82. CSREA Press, 2005.
  46. Hyperband: A novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res., 18:185:1–185:52, 2017.
  47. Katherine M. Malan. A survey of advances in landscape analysis for optimisation. Algorithms, 14(2):40, 2021.
  48. UMAP: uniform manifold approximation and projection for dimension reduction. CoRR, abs/1802.03426, 2018.
  49. Krzysztof Michalak. Low-dimensional euclidean embedding for visualization of search spaces in combinatorial optimization. IEEE Trans. Evol. Comput., 23(2):232–246, 2019.
  50. Autorl hyperparameter landscapes. CoRR, abs/2304.02396, 2023.
  51. Exploratory landscape analysis of continuous space optimization problems using information content. IEEE Trans. Evol. Comput., 19(1):74–87, 2015.
  52. Mark E. J. Newman. Networks: An Introduction. Oxford University Press, 2010. ISBN 978-0-19920665-0.
  53. Mark EJ Newman. Mixing patterns in networks. Physical review E, 67(2):026126, 2003.
  54. Andrew Y. Ng. Preventing ”overfitting” of cross-validation data. In ICML’97: Proc. of the Fourteenth International Conference on Machine Learning, pp.  245–253. Morgan Kaufmann, 1997.
  55. A study of NK landscapes’ basins and local optima networks. In GECCO’08: Proc. of the Genetic and Evolutionary Computation Conference, pp.  555–562. ACM, 2008.
  56. Exploring the hyperparameters of xgboost through 3d visualizations. In AAAI-MAKE’21: Proc. of the AAAI 2021 Spring Symposium on Combining Machine Learning and Knowledge Engineering, volume 2846 of CEUR Workshop Proceedings. CEUR-WS.org, 2021.
  57. Asymmetric transitivity preserving graph embedding. In KDD’16: Proc. of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.  1105–1114. ACM, 2016.
  58. The causes of evolvability and their evolution. Nat. Rev. Genet., 20(1):24–38, 2019.
  59. Learning search spaces for bayesian optimization: Another view of hyperparameter transfer learning. In NIPS’19: Proc. of the 2019 Annual Conference on Neural Information Processing Systems, pp.  12751–12761, 2019.
  60. Fitness landscape analysis of automated machine learning search spaces. In EvoCOP’20: Proc. of the 20th European Conference Evolutionary Computation in Combinatorial Optimization, volume 12102, pp. 114–130. Springer, 2020.
  61. To tune or not to tune the number of trees in random forest. J. Mach. Learn. Res., 18:181:1–181:18, 2017.
  62. Tunability: Importance of hyperparameters of machine learning algorithms. J. Mach. Learn. Res., 20:53:1–53:32, 2019.
  63. Network of epistatic interactions within a yeast snorna. Science, 352(6287):840–844, 2016.
  64. Automl loss landscapes. ACM Trans. Evol. Learn. Optim., 2(3):10:1–10:30, 2022.
  65. Learning the large-scale structure of the MAX-SAT landscape using populations. IEEE Trans. Evol. Comput., 14(4):518–529, 2010.
  66. J. Ross Quinlan. Induction of decision trees. Mach. Learn., 1(1):81–106, 1986.
  67. Analysis of error landscapes in multi-layered neural networks for classification. In CEC’16: Proc. of the IEEE Congress on Evolutionary Computation, pp.  5270–5277. IEEE, 2016.
  68. Learning meta-features for automl. In ICLR’10: Proc. of the 10th International Conference on Learning Representations. OpenReview.net, 2022.
  69. Do imagenet classifiers generalize to imagenet? In ICML’19: Proc. of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp.  5389–5400. PMLR, 2019.
  70. Neutrality in fitness landscapes. Applied Mathematics and Computation, 117(2-3):321–350, 2001.
  71. A study of generalization and fitness landscapes for neuroevolution. IEEE Access, 8:108216–108234, 2020.
  72. Fitness landscape analysis of convolutional neural network architectures for image classification. Inf. Sci., 609:711–726, 2022.
  73. A meta-analysis of overfitting in machine learning. In PNeurIPS’19: Annual Conference on Neural Information Processing Systems 2019, pp.  9175–9185, 2019.
  74. Navigating the protein fitness landscape with gaussian processes. Proc. Natl. Acad. Sci. USA, 110(3):E193–E201, 2013.
  75. A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern., 21(3):660–674, 1991.
  76. Automl to date and beyond: Challenges and opportunities. ACM Comput. Surv., 54(8):175:1–175:36, 2022.
  77. Deepcave: An interactive analysis tool for automated machine learning. CoRR, abs/2206.03493, 2022.
  78. A survey of methods for automated algorithm configuration. J. Artif. Intell. Res., 75:425–487, 2022.
  79. HPO ×{}^{\times}start_FLOATSUPERSCRIPT × end_FLOATSUPERSCRIPT ELA: investigating hyperparameter optimization landscapes by means of exploratory landscape analysis. In PPSN’22: Proc. of the 17th International Conference on Parallel Problem Solving from Nature, volume 13398, pp.  575–589. Springer, 2022.
  80. Benjamin W. B. Shires and Chris J. Pickard. Visualizing energy landscapes through manifold learning. Phys. Rev. X, 11:041026, Nov 2021.
  81. Kate Smith-Miles. Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput. Surv., 41(1):6:1–6:25, 2008.
  82. Measuring instance difficulty for combinatorial optimization problems. Comput. Oper. Res., 39(5):875–889, 2012.
  83. Practical bayesian optimization of machine learning algorithms. In NIPS’12: Proc of the 26th Annual Conference on Neural Information Processing Systems 2012, pp.  2960–2968, 2012.
  84. Charles Spearman. The proof and measurement of association between two things. 1961.
  85. Gaussian process optimization in the bandit setting: No regret and experimental design. In ICML’10: Proc. of the 27th International Conference on Machine Learning, pp.  1015–1022. Omnipress, 2010.
  86. Frank H Stillinger. A topographic view of supercooled liquids and glass formation. Science, 267(5206):1935–1939, 1995.
  87. Multi-task bayesian optimization. NIPS, 26, 2013.
  88. Understanding automl search spaces with local optima networks. In GECCO’22: Genetic and Evolutionary Computation Conference, pp.  449–457. ACM, 2022.
  89. From fitness landscapes to explainable AI and back. In GECCO’23: Proc. of the Conference on Genetic and Evolutionary Computation, pp.  1663–1667. ACM, 2023.
  90. Fitness landscape footprint: A framework to compare neural architecture search problems. CoRR, abs/2111.01584, 2021.
  91. James Trotman. Meta kaggle: Competition shake-up, 2019. URL https://www.kaggle.com/jtrotman/meta-kaggle-competition-shake-up.
  92. Characterising neutrality in neural network error landscapes. In CEC’17: Proc. of the IEEE Congress on Evolutionary Computation, pp.  1374–1381. IEEE, 2017.
  93. Jan N. van Rijn and Frank Hutter. An empirical study of hyperparameter importance across datasets. In AutoML@PKDD/ECML, volume 1998, pp.  91–98. CEUR-WS.org, 2017.
  94. Jan N. van Rijn and Frank Hutter. Hyperparameter importance across datasets. In KDD’18: Proc. of the 24th ACM International Conference on Knowledge Discovery & Data Mining, pp.  2367–2376. ACM, 2018.
  95. Joaquin Vanschoren. Meta-learning: A survey. CoRR, abs/1810.03548, 2018.
  96. Local optima networks of NK landscapes with neutrality. IEEE Trans. Evol. Comput., 15(6):783–797, 2011.
  97. Visualizing population dynamics to examine algorithm performance. IEEE Trans. Evol. Comput., 26(6):1501–1510, 2022.
  98. Learning search space partition for black-box optimization using monte carlo tree search. In NeurIPS’20: Proc. of Advances in Neural Information Processing Systems, 2020.
  99. Speeding up multi-objective hyperparameter optimization by task similarity-based meta-learning for the tree-structured parzen estimator. In IJCAI’23: Proc. of the 32nd International Joint Conference on Artificial Intelligence, pp.  4380–4388. ijcai.org, 2023a.
  100. PED-ANOVA: efficiently quantifying hyperparameter importance in arbitrary subspaces. In IJCAI’23: Proc. of the 32nd International Joint Conference on Artificial Intelligence, pp.  4389–4396. ijcai.org, 2023b.
  101. Edward Weinberger. Correlated and uncorrelated fitness landscapes and how to tell the difference. Biological cybernetics, 63(5):325–336, 1990.
  102. Sequential model-free hyperparameter tuning. In ICDM’15: Proc. of the IEEE International Conference on Data Mining, pp.  1033–1038. IEEE Computer Society, 2015a.
  103. Learning hyperparameter optimization initializations. In DSAA’15: Proc. of the 2015 IEEE International Conference on Data Science and Advanced Analytics, pp.  1–10. IEEE, 2015b.
  104. Learning data set similarities for hyperparameter optimization initializations. In Proc. of the 2015 MetaSel workshop at PKDD-ECML, volume 1455, pp.  15–26. CEUR-WS.org, 2015c.
  105. Sewall Wright. The roles of mutations, inbreeding, crossbreeding and selection in evolution. In Proc. of the 11th International Congress of Genetics, volume 1, pp.  356–366, 1932.
  106. Li Yang and Abdallah Shami. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415:295–316, 2020.
  107. Nas-bench-101: Towards reproducible neural architecture search. In ICML’19: Proc. of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp.  7105–7114. PMLR, 2019.
  108. A survey of fitness landscape analysis for optimization. Neurocomputing, 503:129–139, 2022.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.