Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Unified Gaussian Process for Branching and Nested Hyperparameter Optimization (2402.04885v1)

Published 19 Jan 2024 in stat.ML, cs.AI, and cs.LG

Abstract: Choosing appropriate hyperparameters plays a crucial role in the success of neural networks as hyper-parameters directly control the behavior and performance of the training algorithms. To obtain efficient tuning, Bayesian optimization methods based on Gaussian process (GP) models are widely used. Despite numerous applications of Bayesian optimization in deep learning, the existing methodologies are developed based on a convenient but restrictive assumption that the tuning parameters are independent of each other. However, tuning parameters with conditional dependence are common in practice. In this paper, we focus on two types of them: branching and nested parameters. Nested parameters refer to those tuning parameters that exist only within a particular setting of another tuning parameter, and a parameter within which other parameters are nested is called a branching parameter. To capture the conditional dependence between branching and nested parameters, a unified Bayesian optimization framework is proposed. The sufficient conditions are rigorously derived to guarantee the validity of the kernel function, and the asymptotic convergence of the proposed optimization framework is proven under the continuum-armed-bandit setting. Based on the new GP model, which accounts for the dependent structure among input variables through a new kernel function, higher prediction accuracy and better optimization efficiency are observed in a series of synthetic simulations and real data applications of neural networks. Sensitivity analysis is also performed to provide insights into how changes in hyperparameter values affect prediction accuracy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. R. Agrawal. The continuum-armed bandit problem. SIAM journal on control and optimization, 33(6):1926–1951, 1995.
  2. Asynchronous batch bayesian optimisation with improved local penalisation. arXiv preprint arXiv:1901.10452, 2019.
  3. N. Aronszajn. Theory of reproducing kernels. Transactions of the American mathematical society, 68(3):337–404, 1950.
  4. J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. The Journal of Machine Learning Research, 13(Feb):281–305, 2012.
  5. Algorithms for hyper-parameter optimization. In 25th annual conference on neural information processing systems (NIPS 2011), volume 24. Neural Information Processing Systems Foundation, 2011.
  6. Online optimization in x-armed bandits. In Twenty-Second Annual Conference on Neural Information Processing Systems, 2008.
  7. A. D. Bull. Convergence rates of efficient global optimization algorithms. Journal of Machine Learning Research, 12(10), 2011.
  8. D. A. Cohn. Neural network exploration using optimal experimental design. In Advances in Neural Information Processing Systems, volume 6, pages 679–686. 1996.
  9. Active learning with statistical models. Journal of artificial intelligence research, 4:129–145, 1996.
  10. Fbnetv3: Joint architecture-recipe search using neural acquisition function. arXiv preprint arXiv:2006.02049, 2020.
  11. Additive gaussian process for computer models with qualitative and quantitative factors. Technometrics, 59(3):283–292, 2017.
  12. BOHB: Robust and efficient hyperparameter optimization at scale. In Proceedings of the 35 th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018.
  13. P. I. Frazier. A tutorial on bayesian optimization. arXiv preprint arXiv:1807.02811, 2018.
  14. E. C. Garrido-Merchán and D. Hernández-Lobato. Dealing with categorical and integer-valued variables in bayesian optimization with gaussian processes. Neurocomputing, 380:20–35, 2020.
  15. Google vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1487–1495, 2017.
  16. Batch bayesian optimization via local penalization. In Artificial intelligence and statistics, pages 648–657. PMLR, 2016.
  17. Prediction for computer experiments having quantitative and qualitative input variables. Technometrics, 51(3):278–288, 2009.
  18. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  19. Predictive entropy search for bayesian optimization with unknown constraints. In International conference on machine learning, pages 1699–1707. PMLR, 2015.
  20. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
  21. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017.
  22. Computer experiments with both qualitative and quantitative variables. Technometrics, 58(4):495–507, 2016.
  23. Design and analysis of computer experiments with branching and nested factors. Technometrics, 51(4):354–365, 2009.
  24. Bayesian optimization with censored response data. In NIPS workshop on Bayesian Optimization, Sequential Experimental Design, and Bandits, 2011a.
  25. Sequential model-based optimization for general algorithm configuration. In International conference on learning and intelligent optimization, pages 507–523. Springer, 2011b.
  26. Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4):455–492, 1998.
  27. R. Kleinberg. Nearly tight bounds for the continuum-armed bandit problem. Advances in Neural Information Processing Systems, 17:697–704, 2004.
  28. Multi-armed bandits in metric spaces. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages 681–690, 2008.
  29. Learning multiple layers of features from tiny images. 2009.
  30. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 42(1):55–61, 2000.
  31. Towards automatically-tuned neural networks. In Workshop on Automatic Machine Learning, 2016.
  32. Refined error estimates for radial basis function interpolation. Constructive approximation, 19(4):541–564, 2003.
  33. Bayesian optimization for categorical and category-specific continuous inputs, 2019.
  34. Automatic differentiation in pytorch. 2017.
  35. M. S. Phadke. Quality Engineering Using Robust Design. Englewood Cliffs, NJ: Prentice Hall, 1989.
  36. Gaussian process models for computer experiments with qualitative and quantitative factors. Technometrics, 50(3):383–396, 2008.
  37. Large-scale evolution of image classifiers. In International Conference on Machine Learning, pages 2902–2911. PMLR, 2017.
  38. Group kernels for gaussian process metamodels with categorical inputs. SIAM/ASA Journal on Uncertainty Quantification, 8(2):775–806, 2020.
  39. Fast information-theoretic bayesian optimisation. In International Conference on Machine Learning, pages 4384–4392. PMLR, 2018.
  40. Bayesian optimisation over multiple continuous and categorical inputs. In International Conference on Machine Learning, pages 8276–8285. PMLR, 2020.
  41. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520, 2018.
  42. The design and analysis of computer experiments, volume 1. Springer, 2018.
  43. Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1):148–175, 2015.
  44. L. N. Smith. A disciplined approach to neural network hyper-parameters: Part 1–learning rate, batch size, momentum, and weight decay. arXiv preprint arXiv:1803.09820, 2018.
  45. Practical bayesian optimization of machine learning algorithms. arXiv preprint arXiv:1206.2944, 2012.
  46. I. M. Sobol’. Sensitivity estimates for nonlinear mathematical models. Mathematical Modelling and Computational Experiments, 1(4):407–414, 1993.
  47. Gaussian process optimization in the bandit setting: No regret and experimental design. arXiv preprint arXiv:0912.3995, 2009.
  48. M. L. Stein. Interpolation of spatial data: some theory for kriging. Springer Science & Business Media, 2012.
  49. On the importance of initialization and momentum in deep learning. In International Conference on Machine Learning, 2013.
  50. G. Taguchi. System of Experimental Design, volume 1,2. White Plains, NY: Unipub/Kraus International, 1987.
  51. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2820–2828, 2019.
  52. Z. Wang and N. de Freitas. Theoretical analysis of bayesian optimisation with unknown gaussian process hyper-parameters. arXiv preprint arXiv:1406.7758, 2014.
  53. Nas-bench-101: Towards reproducible neural architecture search. In International Conference on Machine Learning, pages 7105–7114. PMLR, 2019.
  54. Revisiting knowledge distillation via label smoothing regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3903–3911, 2020.
  55. Y. Zhang and W. I. Notz. Computer experiments with qualitative and quantitative variables: A review and reexamination. Quality Engineering, 27(1):2–13, 2015.
  56. A. Zhigljavsky and A. ZˇˇZ\check{\mbox{Z}}overroman_ˇ start_ARG Z end_ARGilinskas. Stochastic Global Optimization. Springer: US, 2008.
  57. A simple approach to emulation for computer models with qualitative and quantitative factors. Technometrics, 53(3):266–273, 2011.
  58. B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jiazhao Zhang (24 papers)
  2. Ying Hung (12 papers)
  3. Chung-Ching Lin (36 papers)
  4. Zicheng Liu (153 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com