MALIBO: Meta-learning for Likelihood-free Bayesian Optimization (2307.03565v3)
Abstract: Bayesian optimization (BO) is a popular method to optimize costly black-box functions. While traditional BO optimizes each new target task from scratch, meta-learning has emerged as a way to leverage knowledge from related tasks to optimize new tasks faster. However, existing meta-learning BO methods rely on surrogate models that suffer from scalability issues and are sensitive to observations with different scales and noise types across tasks. Moreover, they often overlook the uncertainty associated with task similarity. This leads to unreliable task adaptation when only limited observations are obtained or when the new tasks differ significantly from the related tasks. To address these limitations, we propose a novel meta-learning BO approach that bypasses the surrogate model and directly learns the utility of queries across tasks. Our method explicitly models task uncertainty and includes an auxiliary model to enable robust adaptation to new tasks. Extensive experiments show that our method demonstrates strong anytime performance and outperforms state-of-the-art meta-learning BO methods in various benchmarks.
- Learning to learn by gradient descent by gradient descent. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, page 3988–3996. Curran Associates Inc., 2016.
- Collaborative hyperparameter tuning. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, ICML’13, page II–199–II–207. JMLR.org, 2013.
- Random search for hyper-parameter optimization. J. Mach. Learn. Res., 13(null):281–305, 2012.
- Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc., 2011.
- Probabilistic meta-learning for bayesian optimization, 2021. URL https://openreview.net/forum?id=fdZvTFn8Yq.
- Pattern recognition and machine learning. Springer, 2006.
- Weight uncertainty in neural networks. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, page 1613–1622, 2015.
- A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing, 16(5):1190–1208, 1995.
- A downsampled variant of imagenet as an alternative to the cifar datasets. arXiv preprint arXiv:1707.08819, 2017.
- Fast and accurate deep network learning by exponential linear units (elus). In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, 2016.
- Laurence Charles Ward Dixon. The global optimization problem. an introduction. Toward global optimization, 2:1–15, 1978.
- Xuanyi Dong and Yi Yang. Nas-bench-201: Extending the scope of reproducible neural architecture search. In International Conference on Learning Representations, 2020.
- UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
- HPOBench: A collection of reproducible multi-fidelity benchmark problems for HPO. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
- Using meta-learning to initialize bayesian optimization of hyperparameters. In Proceedings of the 2014 International Conference on Meta-Learning and Algorithm Selection - Volume 1201, MLAS’14, page 3–10, Aachen, DEU, 2014. CEUR-WS.org.
- Scalable meta-learning for bayesian optimization using ranking-weighted gaussian process ensembles. In ICML 2018 AutoML Workshop, 2018a.
- Practical transfer learning for bayesian optimization. arXiv preprint arXiv:1802.02219, 2018b.
- Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1126–1135. PMLR, 06–11 Aug 2017.
- Probabilistic model-agnostic meta-learning. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
- The knowledge-gradient policy for correlated normal beliefs. INFORMS journal on Computing, 21(4):599–613, 2009.
- Bayesian optimization for materials design. In Information science for materials discovery and design, pages 45–75. Springer, 2015.
- Jerome H. Friedman. Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5):1189 – 1232, 2001.
- Roman Garnett. Bayesian Optimization. Cambridge University Press, 2022.
- Google vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17, page 1487–1495. Association for Computing Machinery, 2017.
- Alex Graves. Practical variational inference for neural networks. In Advances in Neural Information Processing Systems, volume 24, 2011.
- Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
- Entropy search for information-efficient global optimization. Journal of Machine Learning Research, 13(6), 2012.
- Predictive entropy search for efficient global optimization of black-box functions. Advances in neural information processing systems, 27, 2014.
- The no-u-turn sampler: Adaptively setting path lengths in hamiltonian monte carlo. J. Mach. Learn. Res., 15(1):1593–1623, 2014.
- Why do deep residual networks generalize better than deep feedforward networks? — a neural tangent kernel perspective. In Advances in Neural Information Processing Systems, volume 33, pages 2698–2709, 2020.
- Automated Machine Learning: Methods, Systems, Challenges. Springer Publishing Company, Incorporated, 2019.
- Parallelised bayesian optimisation via thompson sampling. In Amos Storkey and Fernando Perez-Cruz, editors, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine Learning Research, pages 133–142. PMLR, 2018.
- Learning to warm-start bayesian hyperparameter optimization. arXiv preprint arXiv:1710.06219, 2017.
- Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
- Tabular benchmarks for joint architecture and hyperparameter optimization. arXiv preprint arXiv:1905.04970, 2019.
- Learning multiple layers of features from tiny images. online, 2009.
- H. J. Kushner. A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise. Journal of Basic Engineering, 86(1):97–106, 03 1964.
- Transfer learning based search space design for hyperparameter tuning. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, page 967–977, 2022.
- Virtual vs. real: Trading off simulations and physical experiments in reinforcement learning with bayesian optimization. In 2017 IEEE International Conference on Robotics and Automation (ICRA), page 1557–1563. IEEE Press, 2017.
- J. Močkus. On bayesian methods for seeking the extremum. In Optimization Techniques IFIP Technical Conference Novosibirsk, July 1–7, 1974, pages 400–404. Springer Berlin Heidelberg, 1975.
- Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. The MIT Press, 2012.
- Radford M. Neal. Bayesian Learning for Neural Networks. Springer-Verlag, 1996.
- Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Transactions on Information Theory, 56(11):5847–5861, nov 2010.
- Batch bayesian optimisation via density-ratio estimation with guarantees. In Advances in Neural Information Processing Systems, 2022.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Scalable hyperparameter transfer learning. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
- Learning search spaces for bayesian optimization: Another view of hyperparameter transfer learning. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc., 2019.
- Jing Qin. Inferences for case-control and semiparametric two-sample density ratio models. Biometrika, 85(3):619–630, 1998.
- Carl Edward Rasmussen. Gaussian Processes in Machine Learning. Springer Berlin Heidelberg, 2004.
- A quantile-based approach for hyperparameter transfer learning. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 8438–8448. PMLR, 2020.
- Optimizing hyperparameters with conformal quantile regression. arXiv preprint arXiv:2305.03623, 2023.
- Shape your space: A gaussian mixture regularization approach to deterministic autoencoders. In Advances in Neural Information Processing Systems, volume 34, pages 7319–7332. Curran Associates, Inc., 2021.
- Scalable hyperparameter optimization with products of gaussian process experts. In ECML/PKDD, 2016.
- Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1):148–175, 2016.
- Engineering design via surrogate modelling: a practical guide. John Wiley & Sons, 2008.
- A general recipe for likelihood-free Bayesian optimization. In Proceedings of the 39th International Conference on Machine Learning, volume 162, pages 20384–20404. PMLR, 2022.
- Bayesian optimization with robust bayesian neural networks. In Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016.
- Gaussian process optimization in the bandit setting: No regret and experimental design. In Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, page 1015–1022. Omnipress, 2010.
- Density Ratio Estimation in Machine Learning. Cambridge University Press, 2012.
- Multi-task bayesian optimization. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013.
- Bore: Bayesian optimization by density-ratio estimation. In Proceedings of the 38th International Conference on Machine Learning, volume 139, pages 10289–10300. PMLR, 2021.
- Transfer learning with gaussian processes for bayesian optimization. In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pages 6152–6181, 2022.
- Joaquin Vanschoren. Meta-learning: A survey. arXiv preprint arXiv:1810.03548, 2018.
- Openml: Networked science in machine learning. SIGKDD Explor. Newsl., 15(2):49–60, jun 2014.
- Meta-learning acquisition functions for transfer learning in bayesian optimization. In International Conference on Learning Representations, 2020.
- Zi Wang and Stefanie Jegelka. Max-value entropy search for efficient bayesian optimization. In International Conference on Machine Learning, pages 3627–3635. PMLR, 2017.
- Pre-training helps bayesian optimization too. arXiv preprint arXiv:2207.03084, 2022.
- A survey of transfer learning. J. Big Data, 3:9, 2016.
- Maximizing acquisition functions for bayesian optimization. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 9906–9917. Curran Associates Inc., 2018.
- Few-shot bayesian optimization with deep kernel surrogates. In International Conference on Learning Representations, 2021.
- Learning hyperparameter optimization initializations. In 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pages 1–10, 2015.
- Scalable gaussian process-based transfer surrogates for hyperparameter optimization. Mach. Learn., 107(1):43–78, 2018.
- Efficient Transfer Learning Method for Automatic Hyperparameter Tuning. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, volume 33 of Proceedings of Machine Learning Research, pages 1077–1085. PMLR, 2014.
- Jiarong Pan (3 papers)
- Stefan Falkner (14 papers)
- Felix Berkenkamp (29 papers)
- Joaquin Vanschoren (68 papers)