Stochastic continuum armed bandit problem of few linear parameters in high dimensions (1312.0232v4)

Published 1 Dec 2013 in stat.ML, cs.LG, and math.OC

Abstract: We consider a stochastic continuum armed bandit problem where the arms are indexed by the $\ell_2$ ball $B_{d}(1+\nu)$ of radius $1+\nu$ in $\mathbb{R}^d$. The reward functions $r :B_{d}(1+\nu) \rightarrow \mathbb{R}$ are considered to intrinsically depend on $k \ll d$ unknown linear parameters so that $r(\mathbf{x}) = g(\mathbf{A} \mathbf{x})$ where $\mathbf{A}$ is a full rank $k \times d$ matrix. Assuming the mean reward function to be smooth we make use of results from low-rank matrix recovery literature and derive an efficient randomized algorithm which achieves a regret bound of $O(C(k,d) n^{{\frac{1+k}{2+k}}} (\log n)^{{\frac{1}{2+k}})$} with high probability. Here $C(k,d)$ is at most polynomial in $d$ and $k$ and $n$ is the number of rounds or the sampling budget which is assumed to be known beforehand.