Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Generalized Low-Rank Tensor Contextual Bandits (2311.01771v3)

Published 3 Nov 2023 in cs.LG and stat.ML

Abstract: In this paper, we aim to build a novel bandits algorithm that is capable of fully harnessing the power of multi-dimensional data and the inherent non-linearity of reward functions to provide high-usable and accountable decision-making services. To this end, we introduce a generalized low-rank tensor contextual bandits model in which an action is formed from three feature vectors, and thus can be represented by a tensor. In this formulation, the reward is determined through a generalized linear function applied to the inner product of the action's feature tensor and a fixed but unknown parameter tensor with a low tubal rank. To effectively achieve the trade-off between exploration and exploitation, we introduce a novel algorithm called "Generalized Low-Rank Tensor Exploration Subspace then Refine" (G-LowTESTR). This algorithm first collects raw data to explore the intrinsic low-rank tensor subspace information embedded in the decision-making scenario, and then converts the original problem into an almost lower-dimensional generalized linear contextual bandits problem. Rigorous theoretical analysis shows that the regret bound of G-LowTESTR is superior to those in vectorization and matricization cases. We conduct a series of simulations and real data experiments to further highlight the effectiveness of G-LowTESTR, leveraging its ability to capitalize on the low-rank tensor structure for enhanced learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Improved algorithms for linear stochastic bandits. Advances in Neural Information Processing Systems, 24.
  2. A tractable online learning algorithm for the multinomial logit contextual bandit. European Journal of Operational Research, 310(2):737–750.
  3. A multiarmed bandit approach for house ads recommendations. Marketing Science, 42(2):271–292.
  4. Exploration–exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 410(19):1876–1902.
  5. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2):235–256.
  6. Online decision making with high-dimensional covariates. Operations Research, 68(1):276–294.
  7. Learning personalized product recommendations with customer disengagement. Manufacturing & Service Operations Management, 24(4):2010–2028.
  8. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning, 5(1):1–122.
  9. Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 208–214. JMLR Workshop and Conference Proceedings.
  10. Predicting drug–target interactions using probabilistic matrix factorization. Journal of Chemical Information and Modeling, 53(12):3399–3409.
  11. Modeling dynamic user interests: A neural matrix factorization approach. Marketing Science, 40(6):1059–1080.
  12. I-lamm for sparse learning: Simultaneous control of algorithmic complexity and statistical error. Annals of Statistics, 46(2):814.
  13. Parametric bandits: The generalized linear case. Advances in Neural Information Processing Systems, 23.
  14. Low-rank bandits with latent mixtures. arXiv preprint arXiv:1609.01508.
  15. Adaptive policies for perimeter surveillance problems. European Journal of Operational Research, 283(1):265–278.
  16. Filtered poisson process bandit on a continuum. European Journal of Operational Research, 295(2):575–586.
  17. Adaptive sequential experiments with unknown information arrival processes. Manufacturing & Service Operations Management, 24(5):2666–2684.
  18. Improving ad relevance in sponsored search. In Proceedings of the Third ACM International Conference on Web Search and Data Mining, pages 361–370.
  19. Targeted advertising on social networks using online variational tensor regression. In arXiv preprint arXiv:2208.10627.
  20. Improved regret bounds of bilinear bandits using action space analysis. In International Conference on Machine Learning, pages 4744–4754. PMLR.
  21. Bilinear bandits with low-rank structure. In International Conference on Machine Learning, pages 3163–3172. PMLR.
  22. Efficient frameworks for generalized low-rank matrix bandit problems. Advances in Neural Information Processing Systems, 35:19971–19983.
  23. Bernoulli rank-1111 bandits for click feedback. In arXiv preprint arXiv:1703.06513.
  24. Stochastic rank-1 bandits. In Artificial Intelligence and Statistics, pages 392–401. PMLR.
  25. Tensor–tensor products with invertible linear transforms. Linear Algebra and its Applications, 100(485):545–570.
  26. Factorization strategies for third-order tensors. Linear Algebra and its Applications, 435(3):641–658.
  27. Stochastic bandits with context distributions. Advances in Neural Information Processing Systems, 32.
  28. Stochastic low-rank bandits. In arXiv preprint arXiv:1712.04644.
  29. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):4–22.
  30. An unbiased offline evaluation of contextual bandit algorithms with generalized linear models. In Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2, pages 19–36. JMLR Workshop and Conference Proceedings.
  31. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web, pages 661–670.
  32. Provably optimal algorithms for generalized linear contextual bandits. In International Conference on Machine Learning, pages 2071–2080. PMLR.
  33. A simple unified framework for high dimensional bandit problems. In International Conference on Machine Learning, pages 12619–12655. PMLR.
  34. Low-rank tensor completion with a new tensor nuclear norm induced by invertible linear transforms. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5996–6004.
  35. Low-rank generalized linear bandit problems. In International Conference on Artificial Intelligence and Statistics, pages 460–468. PMLR.
  36. Bandit algorithms for precision medicine. In arXiv preprint arXiv:2108.04782.
  37. Computational drug repositioning using low-rank matrix approximation and randomized algorithms. Bioinformatics, 34(11):1904–1912.
  38. Cancer drug discovery as a low rank tensor completion problem. In bioRxiv. Cold Spring Harbor Laboratory.
  39. A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers.
  40. Restricted eigenvalue properties for correlated gaussian designs. The Journal of Machine Learning Research, 11:2241–2259.
  41. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review, 52(3):471–501.
  42. Best arm identification in graphical bilinear bandits. In International Conference on Machine Learning, pages 9010–9019. PMLR.
  43. Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Science, 36(4):500–522.
  44. On high-dimensional and low-rank tensor bandits. IEEE International Symposium on Information Theory.
  45. Robust tensor completion using transformed tensor singular value decomposition. Numerical Linear Algebra with Applications, 27(3):e2299.
  46. Personalized recommendation via parameter-free contextual bandits. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pages 323–332.
  47. Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285–294.
  48. Vershynin, R. (2010). Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027.
  49. Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press.
  50. Stochastic low-rank tensor bandits for multi-dimensional online decision making. In arXiv e-prints.
  51. Learning in generalized linear contextual bandits with stochastic delays. Advances in Neural Information Processing Systems, 32.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Qianxin Yi (3 papers)
  2. Yiyang Yang (3 papers)
  3. Shaojie Tang (99 papers)
  4. Jiapeng Liu (16 papers)
  5. Yao Wang (331 papers)