Batched Stochastic Bandit for Nondegenerate Functions (2405.05733v2)
Abstract: This paper studies batched bandit learning problems for nondegenerate functions. We introduce an algorithm that solves the batched bandit problem for nondegenerate functions near-optimally. More specifically, we introduce an algorithm, called Geometric Narrowing (GN), whose regret bound is of order $\widetilde{{\mathcal{O}}} ( A_{+}d \sqrt{T} )$. In addition, GN only needs $\mathcal{O} (\log \log T)$ batches to achieve this regret. We also provide lower bound analysis for this problem. More specifically, we prove that over some (compact) doubling metric space of doubling dimension $d$: 1. For any policy $\pi$, there exists a problem instance on which $\pi$ admits a regret of order ${\Omega} ( A_-d \sqrt{T})$; 2. No policy can achieve a regret of order $ A_-d \sqrt{T} $ over all problem instances, using less than $ \Omega ( \log \log T ) $ rounds of communications. Our lower bound analysis shows that the GN algorithm achieves near optimal regret with minimal number of batches.
- Improved algorithms for linear stochastic bandits. Advances in neural information processing systems, 24.
- Optimization algorithms on matrix manifolds. Princeton University Press.
- Learning with limited rounds of adaptivity: coin tossing, multi-armed bandits, and ranking from pairwise comparisons. In Conference on Learning Theory, pages 39–75. PMLR.
- Batched dueling bandits. arXiv preprint arXiv:2202.10660.
- Agrawal, R. (1995). The continuum-armed bandit problem. SIAM Journal on Control and Optimization, 33(6):1926–1951.
- Analysis of thompson sampling for the multi-armed bandit problem. In Conference on learning theory, pages 39–1. JMLR Workshop and Conference Proceedings.
- The multiplicative weights update method: a meta-algorithm and applications. Theory of computing, 8(1):121–164.
- Auer, P. (2002). Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov):397–422.
- Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2):235–256.
- The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77.
- Improved rates for the stochastic continuum-armed bandit problem. In Conference on Computational Learning Theory, pages 454–468. Springer.
- Bandit problems: sequential allocation of experiments (monographs on statistics and applied probability). London: Chapman and Hall, 5(71-87):7–7.
- Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press.
- Boumal, N. (2023). An introduction to optimization on smooth manifolds. Cambridge University Press.
- Estimation des densités : Risque minimax. In Dellacherie, C., Meyer, P. A., and Weil, M., editors, Séminaire de Probabilités XII, pages 342–363, Berlin, Heidelberg. Springer Berlin Heidelberg.
- Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning, 5(1):1–122.
- Online optimization in 𝒳𝒳\mathcal{X}caligraphic_X-armed bandits. Advances in Neural Information Processing Systems, 22:201–208.
- 𝒳𝒳\mathcal{X}caligraphic_X-armed bandits. Journal of Machine Learning Research, 12(5):1655–1695.
- Lipschitz bandits without the Lipschitz constant. In International Conference on Algorithmic Learning Theory, pages 144–158. Springer.
- Online learning with switching costs and other adaptive adversaries. Advances in Neural Information Processing Systems, 26:1160–1168.
- Proximal gradient method for nonsmooth optimization over the stiefel manifold. SIAM Journal on Optimization, 30(1):210–239.
- Robust dynamic pricing with demand learning in the presence of outlier customers. Operations Research, 71(4):1362–1386.
- Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 208–214. JMLR Workshop and Conference Proceedings.
- Gaussian process optimization with mutual information. In International Conference on Machine Learning, pages 253–261. PMLR.
- Cope, E. W. (2009). Regret and convergence bounds for a class of continuum-armed bandit problems. IEEE Transactions on Automatic Control, 54(6):1243–1253.
- The price of bandit information for online optimization. Advances in Neural Information Processing Systems, 20.
- Regret bounds for batched bandits. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 7340–7348.
- Lipschitz bandits with batched feedback. IEEE Transactions on Information Theory.
- A new first-order algorithmic framework for optimization problems with orthogonality constraints. SIAM Journal on Optimization, 28(1):302–332.
- Riemannian optimization on the symplectic stiefel manifold. SIAM Journal on Optimization, 31(2):1546–1575.
- Batched multi-armed bandits problem. Advances in Neural Information Processing Systems, 32:503–513.
- Approximating nash equilibria in normal-form games via stochastic optimization. In The Twelfth International Conference on Learning Representations.
- Gittins, J. C. (1979). Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society Series B: Statistical Methodology, 41(2):148–164.
- Sequential batch learning in finite-action linear contextual bandits. arXiv preprint arXiv:2004.06321.
- A broyden class of quasi-newton methods for riemannian optimization. SIAM Journal on Optimization, 25(3):1660–1685.
- Top arm identification in multi-armed bandits with batch arm pulls. In Artificial Intelligence and Statistics, pages 139–148. PMLR.
- Collaborative top distribution identifications with limited interaction. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 160–171. IEEE.
- Kleinberg, R. (2005). Nearly tight bounds for the continuum-armed bandit problem. Advances in Neural Information Processing Systems, 18:697–704.
- Multi-armed bandits in metric spaces. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages 681–690.
- Contextual bandits with continuous actions: Smoothing, zooming, and adapting. The Journal of Machine Learning Research, 21(1):5402–5446.
- Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):4–22.
- Bandit algorithms. Cambridge University Press.
- Stochastic zeroth-order riemannian derivative estimation and optimization. Mathematics of Operations Research, 48(2):1183–1211.
- Zeroth-order riemannian averaging stochastic approximation algorithms. arXiv preprint arXiv:2309.14506.
- Gaussian process bandit optimization with few batches. In International Conference on Artificial Intelligence and Statistics, pages 92–107. PMLR.
- Optimal algorithms for Lipschitz bandits with heavy-tailed rewards. In International Conference on Machine Learning, pages 4154–4163.
- Lipschitz bandits: Regret lower bound and optimal algorithms. In Conference on Learning Theory, pages 975–999. PMLR.
- Efficient contextual bandits with continuous actions. Advances in Neural Information Processing Systems, 33:349–360.
- Dynamic pricing with unknown nonparametric demand and limited price changes. Operations Research.
- Batched bandit problems. The Annals of Statistics, 44(2):660–681.
- Petersen, P. (2006). Riemannian geometry, volume 171. Springer.
- Adaptive discretization for adversarial lipschitz bandits. In Belkin, M. and Kpotufe, S., editors, Proceedings of Thirty Fourth Conference on Learning Theory, volume 134 of Proceedings of Machine Learning Research, pages 3788–3805. PMLR.
- Robbins, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc., 58:527–535.
- Linear bandits with limited adaptivity and learning distributional optimal design. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 74–87.
- Ruszczyński, A. (2021). A stochastic subgradient method for nonsmooth nonconvex multilevel composition optimization. SIAM Journal on Control and Optimization, 59(3):2301–2320.
- Riemannian stochastic variance reduced gradient algorithm with retraction and vector transport. SIAM Journal on Optimization, 29(2):1444–1472.
- Shamir, O. (2013). On the complexity of bandit and derivative-free stochastic convex optimization. In Conference on Learning Theory, pages 3–24. PMLR.
- Slivkins, A. (2014). Contextual bandits with similarity information. Journal of Machine Learning Research, 15(1):2533–2568.
- Slivkins, A. (2019). Introduction to multi-armed bandits. Foundations and Trends® in Machine Learning, 12(1-2):1–286.
- Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE transactions on information theory, 58(5):3250–3265.
- Collaborative learning with limited interaction: tight bounds for distributed exploration in multi-armed bandits. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 126–146. IEEE.
- Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4):285–294.
- Stochastic simultaneous optimistic optimization. In International Conference on Machine Learning, pages 19–27. PMLR.
- Yau, S.-T. (1974). Non-existence of continuous convex functions on certain riemannian manifolds. Mathematische Annalen, 207(4):269–270.
- Improved dynamic regret for non-degenerate functions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 732–741, Red Hook, NY, USA. Curran Associates Inc.