Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems (2401.07298v1)

Published 14 Jan 2024 in stat.ML and cs.LG

Abstract: In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown $d_1$ by $d_2$ matrix $\Theta*$ with rank $r \ll {d_1, d_2}$, and an agent sequentially takes actions based on past experience to maximize the cumulative reward. In this paper, we study the generalized low-rank matrix bandit problem, which has been recently proposed in \cite{lu2021low} under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms on this problem, we first propose the G-ESTT framework that modifies the idea from \cite{jun2019bilinear} by using Stein's method on the subspace estimation and then leverage the estimated subspaces via a regularization idea. Furthermore, we remarkably improve the efficiency of G-ESTT by using a novel exclusion idea on the estimated subspace instead, and propose the G-ESTS framework. We also show that G-ESTT can achieve the $\tilde{O}(\sqrt{(d_1+d_2)MrT})$ bound of regret while G-ESTS can achineve the $\tilde{O}(\sqrt{(d_1+d_2){3/2}Mr{3/2}T})$ bound of regret under mild assumption up to logarithm terms, where $M$ is some problem dependent value. Under a reasonable assumption that $M = O((d_1+d_2)2)$ in our problem setting, the regret of G-ESTT is consistent with the current best regret of $\tilde{O}((d_1+d_2){3/2} \sqrt{rT}/D_{rr})$~\citep{lu2021low} ($D_{rr}$ will be defined later). For completeness, we conduct experiments to illustrate that our proposed algorithms, especially G-ESTS, are also computationally tractable and consistently outperform other state-of-the-art (generalized) linear matrix bandit methods based on a suite of simulations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Improved algorithms for linear stochastic bandits. In NIPS, volume 11, pages 2312–2320, 2011.
  2. Online-to-confidence-set conversions and application to sparse stochastic bandits. In Artificial Intelligence and Statistics, pages 1–9. PMLR, 2012.
  3. Exploration–exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 410(19):1876–1902, 2009.
  4. Online decision making with high-dimensional covariates. Operations Research, 68(1):276–294, 2020.
  5. Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6):717–772, 2009.
  6. Strong consistency of maximum quasi-likelihood estimators in generalized linear models with fixed and adaptive designs. Annals of Statistics, 27(4):1155–1163, 1999.
  7. Normal approximation by Stein’s method. Springer Science & Business Media, 2010.
  8. Use of exchangeable pairs in the analysis of simulations. In Stein’s Method, pages 1–25. Institute of Mathematical Statistics, 2004.
  9. An efficient algorithm for generalized linear bandit: Online stochastic gradient descent and thompson sampling. In International Conference on Artificial Intelligence and Statistics, pages 1585–1593. PMLR, 2021.
  10. Generalized high-dimensional trace regression via nuclear norm regularization. Journal of econometrics, 212(1):177–202, 2019.
  11. Parametric bandits: The generalized linear case. In NIPS, volume 23, pages 586–594, 2010.
  12. Low-rank bandits with latent mixtures. arXiv preprint arXiv:1609.01508, 2016.
  13. Low-rank tensor bandits. arXiv preprint arXiv:2007.15788, 2020.
  14. Provable inductive matrix completion. arXiv preprint arXiv:1306.0626, 2013.
  15. Improved regret bounds of bilinear bandits using action space analysis. In International Conference on Machine Learning, pages 4744–4754. PMLR, 2021.
  16. Structured stochastic linear bandits. arXiv preprint arXiv:1606.05693, 2016.
  17. Bilinear bandits with low-rank structure. In International Conference on Machine Learning, pages 3163–3172. PMLR, 2019.
  18. Bernoulli rank-1111 bandits for click feedback. arXiv preprint arXiv:1703.06513, 2017.
  19. Stochastic rank-1 bandits. In Artificial Intelligence and Statistics, pages 392–401. PMLR, 2017.
  20. Stochastic low-rank bandits. arXiv preprint arXiv:1712.04644, 2017.
  21. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670, 2010.
  22. Provably optimal algorithms for generalized linear contextual bandits. In International Conference on Machine Learning, pages 2071–2080. PMLR, 2017.
  23. A simple unified framework for high dimensional bandit problems. arXiv preprint arXiv:2102.09626, 2021.
  24. Efficient online recommendation via low-rank ensemble sampling. In Proceedings of the 12th ACM Conference on Recommender Systems, pages 460–464, 2018.
  25. Low-rank generalized linear bandit problems. In International Conference on Artificial Intelligence and Statistics, pages 460–468. PMLR, 2021.
  26. Stanislav Minsker. Sub-gaussian estimators of the mean of a random matrix with heavy-tailed entries. The Annals of Statistics, 46(6A):2871–2903, 2018.
  27. A unified framework for high-dimensional analysis of m𝑚mitalic_m-estimators with decomposable regularizers. Statistical science, 27(4):538–557, 2012.
  28. The generalized lasso with non-linear observations. IEEE Transactions on information theory, 62(3):1528–1537, 2016.
  29. Restricted eigenvalue properties for correlated gaussian designs. The Journal of Machine Learning Research, 11:2241–2259, 2010.
  30. Group fairness in bandit arm selection. arXiv preprint arXiv:1912.03802, 2019.
  31. Gilbert W Stewart. Matrix perturbation theory. 1990.
  32. Solving bernoulli rank-one bandits with unimodal thompson sampling. In Algorithmic Learning Theory, pages 862–889. PMLR, 2020.
  33. Dimensionality reduction: a comparative. J Mach Learn Res, 10(66-71):13, 2009.
  34. Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press, 2019.
  35. Michael Woodroofe. A one-armed bandit problem with a concomitant variable. Journal of the American Statistical Association, 74(368):799–806, 1979.
  36. High-dimensional non-gaussian single index models via thresholded score function estimation. In International Conference on Machine Learning, pages 3851–3860. PMLR, 2017.
  37. Efficient matrix sensing using rank-1 gaussian measurements. In International conference on algorithmic learning theory, pages 3–18. Springer, 2015.
  38. Neural contextual bandits with ucb-based exploration. In International Conference on Machine Learning, pages 11492–11502. PMLR, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yue Kang (12 papers)
  2. Cho-Jui Hsieh (211 papers)
  3. Thomas C. M. Lee (34 papers)
Citations (15)
X Twitter Logo Streamline Icon: https://streamlinehq.com