Improved Regret Bounds for Online Kernel Selection under Bandit Feedback (2303.05018v2)
Abstract: In this paper, we improve the regret bound for online kernel selection under bandit feedback. Previous algorithm enjoys a $O((\Vert f\Vert2_{\mathcal{H}_i}+1)K{\frac{1}{3}}T{\frac{2}{3}})$ expected bound for Lipschitz loss functions. We prove two types of regret bounds improving the previous bound. For smooth loss functions, we propose an algorithm with a $O(U{\frac{2}{3}}K{-\frac{1}{3}}(\sumK_{i=1}L_T(f\ast_i)){\frac{2}{3}})$ expected bound where $L_T(f\ast_i)$ is the cumulative losses of optimal hypothesis in $\mathbb{H}{i}={f\in\mathcal{H}_i:\Vert f\Vert{\mathcal{H}i}\leq U}$. The data-dependent bound keeps the previous worst-case bound and is smaller if most of candidate kernels match well with the data. For Lipschitz loss functions, we propose an algorithm with a $O(U\sqrt{KT}\ln{\frac{2}{3}}{T})$ expected bound asymptotically improving the previous bound. We apply the two algorithms to online kernel selection with time constraint and prove new regret bounds matching or improving the previous $O(\sqrt{T\ln{K}} +\Vert f\Vert2{\mathcal{H}_i}\max{\sqrt{T},\frac{T}{\sqrt{\mathcal{R}}}})$ expected bound where $\mathcal{R}$ is the time budget. Finally, we empirically verify our algorithms on online regression and classification tasks.