Efficient Generalized Low-Rank Tensor Contextual Bandits (2311.01771v3)
Abstract: In this paper, we aim to build a novel bandits algorithm that is capable of fully harnessing the power of multi-dimensional data and the inherent non-linearity of reward functions to provide high-usable and accountable decision-making services. To this end, we introduce a generalized low-rank tensor contextual bandits model in which an action is formed from three feature vectors, and thus can be represented by a tensor. In this formulation, the reward is determined through a generalized linear function applied to the inner product of the action's feature tensor and a fixed but unknown parameter tensor with a low tubal rank. To effectively achieve the trade-off between exploration and exploitation, we introduce a novel algorithm called "Generalized Low-Rank Tensor Exploration Subspace then Refine" (G-LowTESTR). This algorithm first collects raw data to explore the intrinsic low-rank tensor subspace information embedded in the decision-making scenario, and then converts the original problem into an almost lower-dimensional generalized linear contextual bandits problem. Rigorous theoretical analysis shows that the regret bound of G-LowTESTR is superior to those in vectorization and matricization cases. We conduct a series of simulations and real data experiments to further highlight the effectiveness of G-LowTESTR, leveraging its ability to capitalize on the low-rank tensor structure for enhanced learning.
- Improved algorithms for linear stochastic bandits. Advances in Neural Information Processing Systems, 24.
- A tractable online learning algorithm for the multinomial logit contextual bandit. European Journal of Operational Research, 310(2):737–750.
- A multiarmed bandit approach for house ads recommendations. Marketing Science, 42(2):271–292.
- Exploration–exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 410(19):1876–1902.
- Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2):235–256.
- Online decision making with high-dimensional covariates. Operations Research, 68(1):276–294.
- Learning personalized product recommendations with customer disengagement. Manufacturing & Service Operations Management, 24(4):2010–2028.
- Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning, 5(1):1–122.
- Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 208–214. JMLR Workshop and Conference Proceedings.
- Predicting drug–target interactions using probabilistic matrix factorization. Journal of Chemical Information and Modeling, 53(12):3399–3409.
- Modeling dynamic user interests: A neural matrix factorization approach. Marketing Science, 40(6):1059–1080.
- I-lamm for sparse learning: Simultaneous control of algorithmic complexity and statistical error. Annals of Statistics, 46(2):814.
- Parametric bandits: The generalized linear case. Advances in Neural Information Processing Systems, 23.
- Low-rank bandits with latent mixtures. arXiv preprint arXiv:1609.01508.
- Adaptive policies for perimeter surveillance problems. European Journal of Operational Research, 283(1):265–278.
- Filtered poisson process bandit on a continuum. European Journal of Operational Research, 295(2):575–586.
- Adaptive sequential experiments with unknown information arrival processes. Manufacturing & Service Operations Management, 24(5):2666–2684.
- Improving ad relevance in sponsored search. In Proceedings of the Third ACM International Conference on Web Search and Data Mining, pages 361–370.
- Targeted advertising on social networks using online variational tensor regression. In arXiv preprint arXiv:2208.10627.
- Improved regret bounds of bilinear bandits using action space analysis. In International Conference on Machine Learning, pages 4744–4754. PMLR.
- Bilinear bandits with low-rank structure. In International Conference on Machine Learning, pages 3163–3172. PMLR.
- Efficient frameworks for generalized low-rank matrix bandit problems. Advances in Neural Information Processing Systems, 35:19971–19983.
- Bernoulli rank-1111 bandits for click feedback. In arXiv preprint arXiv:1703.06513.
- Stochastic rank-1 bandits. In Artificial Intelligence and Statistics, pages 392–401. PMLR.
- Tensor–tensor products with invertible linear transforms. Linear Algebra and its Applications, 100(485):545–570.
- Factorization strategies for third-order tensors. Linear Algebra and its Applications, 435(3):641–658.
- Stochastic bandits with context distributions. Advances in Neural Information Processing Systems, 32.
- Stochastic low-rank bandits. In arXiv preprint arXiv:1712.04644.
- Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):4–22.
- An unbiased offline evaluation of contextual bandit algorithms with generalized linear models. In Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2, pages 19–36. JMLR Workshop and Conference Proceedings.
- A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web, pages 661–670.
- Provably optimal algorithms for generalized linear contextual bandits. In International Conference on Machine Learning, pages 2071–2080. PMLR.
- A simple unified framework for high dimensional bandit problems. In International Conference on Machine Learning, pages 12619–12655. PMLR.
- Low-rank tensor completion with a new tensor nuclear norm induced by invertible linear transforms. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5996–6004.
- Low-rank generalized linear bandit problems. In International Conference on Artificial Intelligence and Statistics, pages 460–468. PMLR.
- Bandit algorithms for precision medicine. In arXiv preprint arXiv:2108.04782.
- Computational drug repositioning using low-rank matrix approximation and randomized algorithms. Bioinformatics, 34(11):1904–1912.
- Cancer drug discovery as a low rank tensor completion problem. In bioRxiv. Cold Spring Harbor Laboratory.
- A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers.
- Restricted eigenvalue properties for correlated gaussian designs. The Journal of Machine Learning Research, 11:2241–2259.
- Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review, 52(3):471–501.
- Best arm identification in graphical bilinear bandits. In International Conference on Machine Learning, pages 9010–9019. PMLR.
- Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Science, 36(4):500–522.
- On high-dimensional and low-rank tensor bandits. IEEE International Symposium on Information Theory.
- Robust tensor completion using transformed tensor singular value decomposition. Numerical Linear Algebra with Applications, 27(3):e2299.
- Personalized recommendation via parameter-free contextual bandits. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pages 323–332.
- Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285–294.
- Vershynin, R. (2010). Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027.
- Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press.
- Stochastic low-rank tensor bandits for multi-dimensional online decision making. In arXiv e-prints.
- Learning in generalized linear contextual bandits with stochastic delays. Advances in Neural Information Processing Systems, 32.
- Qianxin Yi (3 papers)
- Yiyang Yang (3 papers)
- Shaojie Tang (99 papers)
- Jiapeng Liu (16 papers)
- Yao Wang (331 papers)