Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Fidelity Multi-Armed Bandits Revisited (2306.07761v1)

Published 13 Jun 2023 in cs.LG and stat.ML

Abstract: We study the multi-fidelity multi-armed bandit (MF-MAB), an extension of the canonical multi-armed bandit (MAB) problem. MF-MAB allows each arm to be pulled with different costs (fidelities) and observation accuracy. We study both the best arm identification with fixed confidence (BAI) and the regret minimization objectives. For BAI, we present (a) a cost complexity lower bound, (b) an algorithmic framework with two alternative fidelity selection procedures, and (c) both procedures' cost complexity upper bounds. From both cost complexity bounds of MF-MAB, one can recover the standard sample complexity bounds of the classic (single-fidelity) MAB. For regret minimization of MF-MAB, we propose a new regret definition, prove its problem-independent regret lower bound $\Omega(K{1/3}\Lambda{2/3})$ and problem-dependent lower bound $\Omega(K\log \Lambda)$, where $K$ is the number of arms and $\Lambda$ is the decision budget in terms of cost, and devise an elimination-based algorithm whose worst-cost regret upper bound matches its corresponding lower bound up to some logarithmic terms and, whose problem-dependent bound matches its corresponding lower bound in terms of $\Lambda$.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Ucb revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Periodica Mathematica Hungarica, 61(1-2):55–65, 2010.
  2. An adaptive algorithm for finite stochastic partial monitoring. In Proceedings of the 29th International Coference on International Conference on Machine Learning, pages 1779–1786, 2012.
  3. Multi-fidelity optimization of super-cavitating hydrofoils. Computer Methods in Applied Mechanics and Engineering, 332:63–85, 2018.
  4. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning, 5(1):1–122, 2012.
  5. HPOBench: A collection of reproducible multi-fidelity benchmark problems for HPO. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021. URL https://openreview.net/forum?id=1k4rJYEwda-.
  6. Neural architecture search: A survey. The Journal of Machine Learning Research, 20(1):1997–2017, 2019.
  7. Pac bounds for multi-armed bandit and markov decision processes. In COLT, volume 2, pages 255–270. Springer, 2002.
  8. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. Journal of machine learning research, 7(6), 2006.
  9. Bohb: Robust and efficient hyperparameter optimization at scale. In International Conference on Machine Learning, pages 1437–1446. PMLR, 2018.
  10. Marketing return on investment: Seeking clarity for concept and measurement. Applied Marketing Analytics, 1(3):267–282, 2015.
  11. Multi-fidelity optimization via surrogate modelling. Proceedings of the royal society a: mathematical, physical and engineering sciences, 463(2088):3251–3269, 2007.
  12. Explore first, exploit next: The true shape of regret in bandit problems. Mathematics of Operations Research, 44(2):377–399, 2019.
  13. Contextual bandits for advertising budget allocation. Proceedings of the ADKDD, 17, 2020.
  14. Sequential kriging optimization using multiple-fidelity evaluations. Structural and Multidisciplinary Optimization, 32(5):369–382, 2006.
  15. Automated machine learning: methods, systems, challenges. Springer Nature, 2019.
  16. Non-stochastic best arm identification and hyperparameter optimization. In Artificial intelligence and statistics, pages 240–248. PMLR, 2016.
  17. Pac subset selection in stochastic multi-armed bandits. In ICML, volume 12, pages 655–662, 2012.
  18. Gaussian process bandit optimisation with multi-fidelity evaluations. Advances in neural information processing systems, 29, 2016a.
  19. The multi-fidelity multi-armed bandit. Advances in neural information processing systems, 29, 2016b.
  20. Multi-fidelity bayesian optimisation with continuous approximations. In International Conference on Machine Learning, pages 1799–1808. PMLR, 2017.
  21. On the complexity of best-arm identification in multi-armed bandit models. The Journal of Machine Learning Research, 17(1):1–42, 2016.
  22. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1):4–22, 1985.
  23. Cleaning up the neighborhood: A full classification for adversarial partial monitoring. In Algorithmic Learning Theory, pages 529–556. PMLR, 2019.
  24. Bandit algorithms. Cambridge University Press, 2020.
  25. A system for massively parallel hyperparameter tuning. Proceedings of Machine Learning and Systems, 2:230–246, 2020.
  26. Hyperband: A novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research, 18(1):6765–6816, 2017.
  27. The sample complexity of exploration in the multi-armed bandit problem. Journal of Machine Learning Research, 5(Jun):623–648, 2004.
  28. Felix Mohr and Jan N van Rijn. Learning curves for decision making in supervised machine learning–a survey. arXiv preprint arXiv:2201.12150, 2022.
  29. Yahpo gym-an efficient multi-objective multi-fidelity benchmark for hyperparameter optimization. In First Conference on Automated Machine Learning (Main Track), 2022.
  30. Topfarm: Multi-fidelity optimization of wind farms. Wind Energy, 17(12):1797–1816, 2014.
  31. A multi-fidelity modelling approach for evaluation and optimization of wing stroke aerodynamics in flapping flight. Journal of Fluid Mechanics, 721:118–154, 2013.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Xuchuang Wang (18 papers)
  2. Qingyun Wu (47 papers)
  3. Wei Chen (1290 papers)
  4. John C. S. Lui (112 papers)
Citations (2)