Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems (2401.07298v1)
Abstract: In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown $d_1$ by $d_2$ matrix $\Theta*$ with rank $r \ll {d_1, d_2}$, and an agent sequentially takes actions based on past experience to maximize the cumulative reward. In this paper, we study the generalized low-rank matrix bandit problem, which has been recently proposed in \cite{lu2021low} under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms on this problem, we first propose the G-ESTT framework that modifies the idea from \cite{jun2019bilinear} by using Stein's method on the subspace estimation and then leverage the estimated subspaces via a regularization idea. Furthermore, we remarkably improve the efficiency of G-ESTT by using a novel exclusion idea on the estimated subspace instead, and propose the G-ESTS framework. We also show that G-ESTT can achieve the $\tilde{O}(\sqrt{(d_1+d_2)MrT})$ bound of regret while G-ESTS can achineve the $\tilde{O}(\sqrt{(d_1+d_2){3/2}Mr{3/2}T})$ bound of regret under mild assumption up to logarithm terms, where $M$ is some problem dependent value. Under a reasonable assumption that $M = O((d_1+d_2)2)$ in our problem setting, the regret of G-ESTT is consistent with the current best regret of $\tilde{O}((d_1+d_2){3/2} \sqrt{rT}/D_{rr})$~\citep{lu2021low} ($D_{rr}$ will be defined later). For completeness, we conduct experiments to illustrate that our proposed algorithms, especially G-ESTS, are also computationally tractable and consistently outperform other state-of-the-art (generalized) linear matrix bandit methods based on a suite of simulations.
- Improved algorithms for linear stochastic bandits. In NIPS, volume 11, pages 2312–2320, 2011.
- Online-to-confidence-set conversions and application to sparse stochastic bandits. In Artificial Intelligence and Statistics, pages 1–9. PMLR, 2012.
- Exploration–exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 410(19):1876–1902, 2009.
- Online decision making with high-dimensional covariates. Operations Research, 68(1):276–294, 2020.
- Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6):717–772, 2009.
- Strong consistency of maximum quasi-likelihood estimators in generalized linear models with fixed and adaptive designs. Annals of Statistics, 27(4):1155–1163, 1999.
- Normal approximation by Stein’s method. Springer Science & Business Media, 2010.
- Use of exchangeable pairs in the analysis of simulations. In Stein’s Method, pages 1–25. Institute of Mathematical Statistics, 2004.
- An efficient algorithm for generalized linear bandit: Online stochastic gradient descent and thompson sampling. In International Conference on Artificial Intelligence and Statistics, pages 1585–1593. PMLR, 2021.
- Generalized high-dimensional trace regression via nuclear norm regularization. Journal of econometrics, 212(1):177–202, 2019.
- Parametric bandits: The generalized linear case. In NIPS, volume 23, pages 586–594, 2010.
- Low-rank bandits with latent mixtures. arXiv preprint arXiv:1609.01508, 2016.
- Low-rank tensor bandits. arXiv preprint arXiv:2007.15788, 2020.
- Provable inductive matrix completion. arXiv preprint arXiv:1306.0626, 2013.
- Improved regret bounds of bilinear bandits using action space analysis. In International Conference on Machine Learning, pages 4744–4754. PMLR, 2021.
- Structured stochastic linear bandits. arXiv preprint arXiv:1606.05693, 2016.
- Bilinear bandits with low-rank structure. In International Conference on Machine Learning, pages 3163–3172. PMLR, 2019.
- Bernoulli rank-1111 bandits for click feedback. arXiv preprint arXiv:1703.06513, 2017.
- Stochastic rank-1 bandits. In Artificial Intelligence and Statistics, pages 392–401. PMLR, 2017.
- Stochastic low-rank bandits. arXiv preprint arXiv:1712.04644, 2017.
- A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670, 2010.
- Provably optimal algorithms for generalized linear contextual bandits. In International Conference on Machine Learning, pages 2071–2080. PMLR, 2017.
- A simple unified framework for high dimensional bandit problems. arXiv preprint arXiv:2102.09626, 2021.
- Efficient online recommendation via low-rank ensemble sampling. In Proceedings of the 12th ACM Conference on Recommender Systems, pages 460–464, 2018.
- Low-rank generalized linear bandit problems. In International Conference on Artificial Intelligence and Statistics, pages 460–468. PMLR, 2021.
- Stanislav Minsker. Sub-gaussian estimators of the mean of a random matrix with heavy-tailed entries. The Annals of Statistics, 46(6A):2871–2903, 2018.
- A unified framework for high-dimensional analysis of m𝑚mitalic_m-estimators with decomposable regularizers. Statistical science, 27(4):538–557, 2012.
- The generalized lasso with non-linear observations. IEEE Transactions on information theory, 62(3):1528–1537, 2016.
- Restricted eigenvalue properties for correlated gaussian designs. The Journal of Machine Learning Research, 11:2241–2259, 2010.
- Group fairness in bandit arm selection. arXiv preprint arXiv:1912.03802, 2019.
- Gilbert W Stewart. Matrix perturbation theory. 1990.
- Solving bernoulli rank-one bandits with unimodal thompson sampling. In Algorithmic Learning Theory, pages 862–889. PMLR, 2020.
- Dimensionality reduction: a comparative. J Mach Learn Res, 10(66-71):13, 2009.
- Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press, 2019.
- Michael Woodroofe. A one-armed bandit problem with a concomitant variable. Journal of the American Statistical Association, 74(368):799–806, 1979.
- High-dimensional non-gaussian single index models via thresholded score function estimation. In International Conference on Machine Learning, pages 3851–3860. PMLR, 2017.
- Efficient matrix sensing using rank-1 gaussian measurements. In International conference on algorithmic learning theory, pages 3–18. Springer, 2015.
- Neural contextual bandits with ucb-based exploration. In International Conference on Machine Learning, pages 11492–11502. PMLR, 2020.
- Yue Kang (12 papers)
- Cho-Jui Hsieh (211 papers)
- Thomas C. M. Lee (34 papers)