Tighter Confidence Bounds for Sequential Kernel Regression (2403.12732v2)
Abstract: Confidence bounds are an essential tool for rigorously quantifying the uncertainty of predictions. They are a core component in many sequential learning and decision-making algorithms, with tighter confidence bounds giving rise to algorithms with better empirical performance and better performance guarantees. In this work, we use martingale tail inequalities to establish new confidence bounds for sequential kernel regression. Our confidence bounds can be computed by solving a conic program, although this bare version quickly becomes impractical, because the number of variables grows with the sample size. However, we show that the dual of this conic program allows us to efficiently compute tight confidence bounds. We prove that our new confidence bounds are always tighter than existing ones in this setting. We apply our confidence bounds to kernel bandit problems, and we find that when our confidence bounds replace existing ones, the KernelUCB (GP-UCB) algorithm has better empirical performance, a matching worst-case performance guarantee and comparable computational cost.
- Abbasi-Yadkori, Y. Online learning for linearly parametrized control problems. PhD thesis, University of Alberta, 2012.
- Improved algorithms for linear stochastic bandits. Advances in neural information processing systems, 24, 2011.
- Online-to-confidence-set conversions and application to sparse stochastic bandits. In Artificial Intelligence and Statistics, pp. 1–9. PMLR, 2012.
- Differentiable convex optimization layers. Advances in neural information processing systems, 32, 2019.
- Auer, P. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov):397–422, 2002.
- Safe model-based reinforcement learning with stability guarantees. Advances in neural information processing systems, 30, 2017.
- Efficient and modular implicit differentiation. Advances in neural information processing systems, 35:5230–5242, 2022.
- Prediction, learning, and games. Cambridge university press, 2006.
- On kernelized multi-armed bandits. In International Conference on Machine Learning, pp. 844–853. PMLR, 2017.
- Self-normalized processes: exponential inequalities, moment bounds and iterated logarithm laws. Annals of Probability, 32:1902–1933, 2004.
- Self-normalized processes: Limit theory and Statistical Applications. Springer, 2009.
- CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1–5, 2016.
- Streaming kernel regression with provably adaptive mean, variance, and regularization. The Journal of Machine Learning Research, 19(1):650–683, 2018.
- Likelihood ratio confidence sets for sequential decision making. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
- Improved algorithms for stochastic linear bandits using tail bounds for martingale mixtures. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
- lil’ucb: An optimal exploration algorithm for multi-armed bandits. In Conference on Learning Theory, pp. 423–439. PMLR, 2014.
- Bandit optimisation of functions in the matérn kernel rkhs. In International Conference on Artificial Intelligence and Statistics, pp. 2486–2495. PMLR, 2020.
- Information theoretic regret bounds for online nonlinear control. Advances in Neural Information Processing Systems, 33:15312–15325, 2020.
- Some results on tchebycheffian spline functions. Journal of mathematical analysis and applications, 33(1):82–95, 1971.
- Lattimore, T. A lower bound for linear and kernel regression with adaptive covariates. In The Thirty Sixth Annual Conference on Learning Theory, pp. 2095–2113. PMLR, 2023.
- Bandit algorithms. Cambridge University Press, 2020.
- Gaussian process bandit optimization with few batches. In International Conference on Artificial Intelligence and Statistics, pp. 92–107. PMLR, 2022.
- Provably efficient kernelized Q-learning. arXiv preprint arXiv:2204.10349, 2022.
- Differentiable implicit layers. In NeurIPS ML for Engineering Workshop, 2020.
- Uncertainty quantification using martingales for misspecified Gaussian processes. In Algorithmic Learning Theory, pp. 963–982. PMLR, 2021.
- Eluder dimension and the sample complexity of optimistic exploration. Advances in Neural Information Processing Systems, 26, 2013.
- A generalized representer theorem. In International conference on computational learning theory, pp. 416–426. Springer, 2001.
- Gaussian process optimization in the bandit setting: No regret and experimental design. In Proc. International Conference on Machine Learning (ICML), 2010.
- Safe exploration for optimization with gaussian processes. In International conference on machine learning, pp. 997–1005. PMLR, 2015.
- Reinforcement learning: An introduction. MIT press, 2018.
- Kernelized reinforcement learning with order optimal regret bounds. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
- On information gain and regret bounds in gaussian process bandits. In International Conference on Artificial Intelligence and Statistics, pp. 82–90. PMLR, 2021.
- Finite-Time Analysis of Kernelised Contextual Bandits. In Uncertainty in Artificial Intelligence, 2013.
- Ville, J. Etude critique de la notion de collectif. Bull. Amer. Math. Soc, 45(11):824, 1939.
- On the sublinear regret of gp-ucb. Advances in Neural Information Processing Systems, 2023.
- On function approximation in reinforcement learning: optimism in the face of large state spaces. In Proceedings of the 34th International Conference on Neural Information Processing Systems, pp. 13903–13916, 2020.
- Zhang, F. The Schur complement and its applications, volume 4. Springer Science & Business Media, 2006.