Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Risk-Controlling Model Selection via Guided Bayesian Optimization (2312.01692v1)

Published 4 Dec 2023 in cs.LG, cs.AI, stat.ME, and stat.ML

Abstract: Adjustable hyperparameters of machine learning models typically impact various key trade-offs such as accuracy, fairness, robustness, or inference cost. Our goal in this paper is to find a configuration that adheres to user-specified limits on certain risks while being useful with respect to other conflicting metrics. We solve this by combining Bayesian Optimization (BO) with rigorous risk-controlling procedures, where our core idea is to steer BO towards an efficient testing strategy. Our BO method identifies a set of Pareto optimal configurations residing in a designated region of interest. The resulting candidates are statistically verified and the best-performing configuration is selected with guaranteed risk levels. We demonstrate the effectiveness of our approach on a range of tasks with multiple desiderata, including low error rates, equitable predictions, handling spurious correlations, managing rate and distortion in generative models, and reducing computational costs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. Learn then test: Calibrating predictive algorithms to achieve risk control. arXiv preprint arXiv:2110.01052, 2021.
  2. Conformal risk control. arXiv preprint arXiv:2208.02814, 2022.
  3. Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov):397–422, 2002.
  4. Hype: An algorithm for fast hypervolume-based many-objective optimization. Evolutionary computation, 19(1):45–76, 2011.
  5. Predictive inference with the jackknife+. The Annals of Statistics, 49(1):486–507, 2021.
  6. Distribution-free, risk-controlling prediction sets. Journal of the ACM (JACM), 68(6):1–34, 2021.
  7. Max-value entropy search for multi-objective bayesian optimization. In Advances in Neural Information Processing Systems, volume 32, 2019.
  8. Uncertainty-aware search framework for multi-objective bayesian optimization. Proceedings of the AAAI Conference on Artificial Intelligence, 34(06):10044–10052, 2020.
  9. Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 13(2):e1484, 2023.
  10. Pythae: Unifying generative autoencoders in python - a benchmarking use case. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  21575–21589. Curran Associates, Inc., 2022.
  11. Multi-objective deep learning with adaptive reference vectors. Advances in Neural Information Processing Systems, 35:32723–32735, 2022.
  12. Surrogate-based multiobjective optimization: Parego update and test. In Workshop on Computational Intelligence (UKCI), volume 770, 2015.
  13. Parallel bayesian optimization of multiple noisy objectives with expected hypervolume improvement. Advances in Neural Information Processing Systems, 34:2187–2200, 2021.
  14. A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE transactions on evolutionary computation, 6(2):182–197, 2002.
  15. Jean-Antoine Désidéri. Multiple-gradient descent algorithm (mgda) for multiobjective optimization. Comptes Rendus Mathematique, 350(5-6):313–318, 2012.
  16. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  17. Uci machine learning repository, 2017.
  18. An emo algorithm using the hypervolume measure as selection criterion. In International Conference on Evolutionary Multi-Criterion Optimization, pp.  62–76. Springer, 2005.
  19. Single-and multiobjective evolutionary optimization assisted by gaussian random field metamodels. IEEE Transactions on Evolutionary Computation, 10(4):421–439, 2006.
  20. Mind the gap: Measuring generalization performance across multiple objectives. In International Symposium on Intelligent Data Analysis, pp. 130–142. Springer, 2023.
  21. Peter I Frazier. A tutorial on bayesian optimization. arXiv preprint arXiv:1807.02811, 2018.
  22. Bayesian optimization with inequality constraints. In International Conference on Machine Learning, pp. 937–945. PMLR, 2014.
  23. Distribution-free binary classification: prediction sets, confidence intervals and calibration. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  24. Equality of opportunity in supervised learning. Advances in neural information processing systems, 29, 2016.
  25. Predictive entropy search for multi-objective bayesian optimization. In International conference on machine learning, pp. 1492–1501. PMLR, 2016.
  26. beta-vae: Learning basic visual concepts with a constrained variational framework. In International conference on learning representations, 2016.
  27. Wassily Hoeffding. Probability inequalities for sums of bounded random variables. The collected works of Wassily Hoeffding, pp.  409–426, 1994.
  28. Sture Holm. A simple sequentially rejective multiple test procedure. Scandinavian journal of statistics, pp.  65–70, 1979.
  29. How to specify a reference point in hypervolume calculation for fair performance comparison. Evolutionary computation, 26(3):411–440, 2018.
  30. On feature learning in the presence of spurious correlations. Advances in Neural Information Processing Systems, 35:38516–38532, 2022.
  31. Multi-objective hyperparameter optimization–an overview. arXiv preprint arXiv:2206.07438, 2022.
  32. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  33. Joshua Knowles. Parego: A hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems. IEEE Transactions on Evolutionary Computation, 10(1):50–66, 2006.
  34. Diversity-guided multi-objective bayesian optimization with batch evaluations. Advances in Neural Information Processing Systems, 33:17708–17720, 2020.
  35. HJ Kushner. A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. Journal of Basic Engineering, 86(1):97–106, 1964.
  36. Adaptive inference through early-exit networks: Design, challenges and directions. In Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning, pp.  1–6, 2021.
  37. Efficiently controlling multiple risks with pareto testing. ICLR, 2023.
  38. Yann LeCun. The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/, 1998.
  39. Distribution-free prediction sets. Journal of the American Statistical Association, 108(501):278–287, 2013.
  40. Distribution-free predictive inference for regression. Journal of the American Statistical Association, 113(523):1094–1111, 2018.
  41. Constrained bayesian optimization with noisy experiments. Bayesian Analysis, 14(2), 2019.
  42. Pareto multi-task learning. Advances in neural information processing systems, 32, 2019.
  43. Controllable pareto multi-task learning. arXiv preprint arXiv:2010.06313, 2020.
  44. Smac3: A versatile bayesian optimization package for hyperparameter optimization. J. Mach. Learn. Res., 23:54–1, 2022.
  45. Too relaxed to be fair. In International Conference on Machine Learning, pp. 6360–6369. PMLR, 2020.
  46. Multi-task learning with user preferences: Gradient descent with controlled ascent in pareto optimization. In International Conference on Machine Learning, pp. 6597–6607. PMLR, 2020.
  47. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 42(1):55–61, 2000.
  48. Gaurav Menghani. Efficient deep learning: A survey on making deep learning models smaller, faster, and better. ACM Computing Surveys, 55(12):1–37, 2023.
  49. Jonas Močkus. On bayesian methods for seeking the extremum. In Optimization Techniques IFIP Technical Conference: Novosibirsk, July 1–7, 1974, pp.  400–404. Springer, 1975.
  50. Learning the pareto front with hypernetworks. arXiv preprint arXiv:2010.04104, 2020.
  51. Addressing fairness in classification with a model-agnostic multi-objective algorithm. In Uncertainty in Artificial Intelligence, pp.  600–609. PMLR, 2021.
  52. A flexible framework for multi-objective bayesian optimization using random scalarizations. In Uncertainty in Artificial Intelligence, pp.  766–776. PMLR, 2020.
  53. A review on fairness in machine learning. ACM Computing Surveys (CSUR), 55(3):1–44, 2022.
  54. Multiobjective optimization on a limited budget of evaluations using model-assisted-metric selection. In International conference on parallel problem solving from nature, pp.  784–794. Springer, 2008.
  55. Stochastic backpropagation and approximate inference in deep generative models. In International conference on machine learning, pp. 1278–1286. PMLR, 2014.
  56. Scalable pareto front approximation for deep multi-objective learning. In 2021 IEEE international conference on data mining (ICDM), pp.  1306–1311. IEEE, 2021.
  57. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731, 2019.
  58. Optimizing hyperparameters with conformal quantile regression. arXiv preprint arXiv:2305.03623, 2023.
  59. Multi-task learning as multi-objective optimization. Advances in neural information processing systems, 31, 2018.
  60. Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1):148–175, 2015.
  61. Bayesian optimization with conformal prediction sets. In Artificial Intelligence and Statistics, 2023.
  62. Vladimir Vovk. On-line confidence machines are well-calibrated. In The 43rd Annual IEEE Symposium on Foundations of Computer Science., 2002.
  63. Large-scale probabilistic predictors with and without guarantees of validity. In Advances in Neural Information Processing Systems (NeurIPS), 2015.
  64. Nonparametric predictive distributions based on conformal prediction. In Proceedings of the Sixth Workshop on Conformal and Probabilistic Prediction and Applications, 2017.
  65. Recent advances in bayesian optimization. arXiv preprint arXiv:2206.03301, 2022.
  66. Gaussian processes for machine learning. MIT press Cambridge, MA, 2006.
  67. Zero time waste: Recycling predictions in early exit neural networks. Advances in Neural Information Processing Systems, 34:2516–2528, 2021.
  68. On convexity and bounds of fairness-aware classification. In The World Wide Web Conference, pp.  3356–3362, 2019.
  69. Change is hard: A closer look at subpopulation shift. arXiv preprint arXiv:2302.12254, 2023.
  70. Moea/d: A multiobjective evolutionary algorithm based on decomposition. IEEE Transactions on evolutionary computation, 11(6):712–731, 2007.
  71. Character-level convolutional networks for text classification. Advances in neural information processing systems, 28, 2015.
  72. Bayesian optimization with formal safety guarantees via online conformal prediction. arXiv preprint arXiv:2306.17815, 2023.
  73. Multiobjective optimization using evolutionary algorithms—a comparative case study. In International conference on parallel problem solving from nature, pp.  292–301. Springer, 1998.
Citations (4)

Summary

  • The paper presents a novel approach that integrates Bayesian Optimization with risk-controlling procedures to select model configurations under statistical constraints.
  • It defines a region of interest in the objective space to focus the search on Pareto optimal configurations that are likely to pass subsequent statistical tests.
  • Empirical evaluations demonstrate that the method efficiently balances objectives such as fairness, robustness, and cost-accuracy in various machine learning tasks.

Bayesian Optimization (BO) is a widely used method for selecting optimal configurations of functions that are expensive to evaluate, such as hyperparameters in machine learning models. These hyperparameters can significantly influence model performance aspects such as accuracy, fairness, robustness, and computational cost. A fundamental challenge in model selection is finding hyperparameters that not only improve these performance aspects but also adhere to certain user-specified constraints such as error rates or fairness metrics.

To address this challenge, the discussed paper introduces a novel approach that combines Bayesian Optimization with rigorous risk-controlling procedures to guide the search process towards more efficient and statistically valid model configurations. The core idea of the method is to define a "region of interest" in the objective space that reflects the user's constraints and preferences, and then adjust the BO process to focus on finding Pareto optimal configurations within this region.

The adjusted BO procedure is uniquely tailored to recover a focused set of Pareto optimal configurations that are most likely to pass subsequent statistical testing. These configurations are then subjected to a statistical testing framework that provides a way to verify whether they meet the desired constraints with a user-specified level of confidence. Importantly, this enables the selection of models under multiple constraints while maintaining computational efficiency.

The effectiveness of this methodology is demonstrated across a variety of tasks involving different objectives, such as preserving fairness in classification, ensuring robustness against spurious correlations, managing reconstruction quality and latent space complexity in Variational Autoencoders (VAEs), and optimizing cost-accuracy trade-offs in large transformer models. Through empirical evaluations, the authors show that their guided BO method selects efficient configurations that are both high-performing and meet statistical constraints, even when compared to various baselines.

Overall, this work provides a significant contribution by proposing a flexible and efficient framework for model selection under multiple constraints. It offers a pragmatic solution for practitioners to balance diverse objectives and control risk when selecting machine learning model configurations, especially in scenarios with limited computational budgets.

X Twitter Logo Streamline Icon: https://streamlinehq.com