Meta Learning in Bandits within Shared Affine Subspaces (2404.00688v1)
Abstract: We study the problem of meta-learning several contextual stochastic bandits tasks by leveraging their concentration around a low-dimensional affine subspace, which we learn via online principal component analysis to reduce the expected regret over the encountered bandits. We propose and theoretically analyze two strategies that solve the problem: One based on the principle of optimism in the face of uncertainty and the other via Thompson sampling. Our framework is generic and includes previously proposed approaches as special cases. Besides, the empirical results show that our methods significantly reduce the regret on several bandit tasks.
- Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems.
- Thompson sampling for contextual bandits with linear payoffs. In Proceedings of the 30th International Conference on Machine Learning.
- Aiolli, F. (2012). Transfer learning by kernel meta-learning. In Proceedings of ICML Workshop on Unsupervised and Transfer Learning.
- Meta-learning by adjusting priors based on extended PAC-Bayes theory. In Proceedings of the 35th International Conference on Machine Learning.
- Data-driven online recommender systems with costly information acquisition. IEEE Trans. Serv. Comput.
- Sequential transfer in multi-armed bandit with finite set of models. Advances in Neural Information Processing Systems.
- Non-stationary bandits and meta-learning with a small set of optimal arms. arXiv preprint arXiv:2202.13001.
- Meta-learning adversarial bandits. arXiv preprint arXiv:2205.14128.
- Meta Dynamic Pricing: Transfer Learning Across Experiments.
- No Regrets for Learning the Prior in Bandits. In Advances in Neural Information Processing Systems.
- Baxter, J. (2000). A model of inductive bias learning. Journal of artificial intelligence research.
- Hypothesis transfer in bandits by weighted models. In Machine Learning and Knowledge Discovery in Databases.
- Survey on applications of multi-armed and contextual bandits. In 2020 IEEE Congress on Evolutionary Computation (CEC).
- Differentiable meta-learning of bandit policies. Advances in Neural Information Processing Systems.
- Online principal component analysis in high dimension: Which algorithm to choose?
- Meta-learning with stochastic linear bandits. In International Conference on Machine Learning. PMLR.
- Meta representation learning with contextual linear bandits. arXiv preprint arXiv:2205.15100.
- Multi-task representation learning with stochastic linear bandits.
- Multi-task and meta-learning with sparse linear bandits. In Uncertainty in Artificial Intelligence. PMLR.
- Contextual bandits with linear payoff functions. In AISTATS.
- Learning to learn around a common mean. Advances in Neural Information Processing Systems.
- Online meta-learning. In International Conference on Machine Learning. PMLR.
- Glowacka, D. et al. (2019). Bandit algorithms in information retrieval. Foundations and Trends® in Information Retrieval.
- A bound on tail probabilities for quadratic forms in independent random variables. The Annals of Mathematical Statistics.
- A tail inequality for quadratic forms of subgaussian random vectors. Electronic Communications in Probability.
- Automated machine learning: methods, systems, challenges. Springer Nature.
- Subspace learning for effective meta-learning. In International Conference on Machine Learning. PMLR.
- Meta-Learning Hypothesis Spaces for Sequential Decision-making. ArXiv.
- Meta-thompson sampling. In International Conference on Machine Learning. PMLR.
- Meta-learning bandit policies by gradient ascent. arXiv e-prints.
- Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. The Annals of Statistics.
- The epoch-greedy algorithm for multi-armed bandits with side information. Advances in neural information processing systems.
- A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web.
- Mezzadri, F. (2006). How to generate random matrices from the classical compact groups. arXiv preprint math-ph/0609050.
- Linear combinatorial semi-bandit with causally related rewards. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22.
- Metalearning Linear Bandits by Prior Update. In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics.
- Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society.
- Pacoh: Bayes-optimal meta-learning with pac-guarantees. In International Conference on Machine Learning. PMLR.
- A tutorial on thompson sampling. Foundations and Trends® in Machine Learning.
- Lifelong Bandit Optimization: No Prior and No Regret.
- Learning theory estimates via integral operators and their approximations. Constructive Approximation.
- Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika.
- Thrun, S. (1998). Lifelong learning algorithms. In Learning to learn, pages 181–209. Springer.
- Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. Cambridge University Press.
- Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning.
- Impact of representation learning in linear bandits. arXiv preprint arXiv:2010.06531.
- Differentiable linear bandit algorithm. arXiv preprint arXiv:2006.03000.
- Laplacian-regularized graph bandits: Algorithms and theoretical analysis. In International Conference on Artificial Intelligence and Statistics. PMLR.
- A useful variant of the davis—kahan theorem for statisticians. Biometrika.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.