Contextual Combinatorial Bandits with Probabilistically Triggered Arms (2303.17110v3)
Abstract: We study contextual combinatorial bandits with probabilistically triggered arms (C$2$MAB-T) under a variety of smoothness conditions that capture a wide range of applications, such as contextual cascading bandits and contextual influence maximization bandits. Under the triggering probability modulated (TPM) condition, we devise the C$2$-UCB-T algorithm and propose a novel analysis that achieves an $\tilde{O}(d\sqrt{KT})$ regret bound, removing a potentially exponentially large factor $O(1/p_{\min})$, where $d$ is the dimension of contexts, $p_{\min}$ is the minimum positive probability that any arm can be triggered, and batch-size $K$ is the maximum number of arms that can be triggered per round. Under the variance modulated (VM) or triggering probability and variance modulated (TPVM) conditions, we propose a new variance-adaptive algorithm VAC$2$-UCB and derive a regret bound $\tilde{O}(d\sqrt{T})$, which is independent of the batch-size $K$. As a valuable by-product, our analysis technique and variance-adaptive algorithm can be applied to the CMAB-T and C$2$MAB setting, improving existing results there as well. We also include experiments that demonstrate the improved performance of our algorithms compared with benchmark algorithms on synthetic and real-world datasets.
- Improved algorithms for linear stochastic bandits. Advances in neural information processing systems, 24, 2011.
- Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2-3):235–256, 2002.
- Bernstein, S. The Theory of Probabilities (Russian). Moscow, 1946.
- Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning, 5(1):1–122, 2012.
- Combinatorial multi-armed bandit: General framework and applications. In International Conference on Machine Learning, pp. 151–159. PMLR, 2013.
- Combinatorial multi-armed bandit and its extension to probabilistically triggered arms. The Journal of Machine Learning Research, 17(1):1746–1778, 2016a.
- A general framework for estimating graphlet statistics via random walk. Proceedings of the VLDB Endowment, 10(3):253–264, 2016b.
- Combinatorial bandits revisited. Advances in neural information processing systems, 28, 2015.
- Freedman, D. A. On tail probabilities for martingales. the Annals of Probability, pp. 100–118, 1975.
- Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations. IEEE/ACM Transactions on Networking (TON), 20(5):1466–1478, 2012.
- Henderson, C. R. Best linear unbiased estimation and prediction under a selection model. Biometrics, pp. 423–447, 1975.
- Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 137–146, 2003.
- Combinatorial cascading bandits. In Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 1, pp. 1450–1458, 2015a.
- Tight regret bounds for stochastic combinatorial semi-bandits. In AISTATS, 2015b.
- Bandit algorithms. Cambridge University Press, 2020.
- Linear multi-resource allocation with semi-bandit feedback. Advances in Neural Information Processing Systems, 28, 2015.
- Contextual combinatorial cascading bandits. In International conference on machine learning, pp. 1245–1253. PMLR, 2016.
- Online influence maximization under linear threshold model. Advances in Neural Information Processing Systems, 33:1192–1204, 2020.
- Bandit learning in decentralized matching markets. Journal of Machine Learning Research, 22(211):1–34, 2021a.
- Multi-layered network exploration via random walks: From offline optimization to online learning. In International Conference on Machine Learning, pp. 7057–7066. PMLR, 2021b.
- Batch-size independent regret bounds for combinatorial semi-bandits with probabilistically triggered arms or independent arms. In Advances in Neural Information Processing Systems, 2022.
- Batch-size independent regret bounds for the combinatorial multi-armed bandit problem. In Conference on Learning Theory, pp. 2465–2489. PMLR, 2019.
- Contextual combinatorial bandit and its application on diversified online recommendation. In Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 461–469. SIAM, 2014.
- Robbins, H. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5):527–535, 1952.
- Near-optimal regret bounds for contextual combinatorial semi-bandits with linear payoff functions. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 9791–9798, 2021.
- Minimax regret for cascading bandits. In Advances in Neural Information Processing Systems, 2022.
- Improving regret bounds for combinatorial semi-bandits with probabilistically triggered arms and its applications. In Advances in Neural Information Processing Systems, pp. 1161–1171, 2017.
- Online influence maximization under independent cascade model with semi-bandit feedback. Advances in neural information processing systems, 30, 2017.
- Nearly minimax optimal reinforcement learning for linear mixture markov decision processes. In Conference on Learning Theory, pp. 4532–4576. PMLR, 2021.
- Cascading bandits for large-scale recommendation problems. arXiv preprint arXiv:1603.05359, 2016.
- Online competitive influence maximization. In International Conference on Artificial Intelligence and Statistics, pp. 11472–11502. PMLR, 2022.
- Xutong Liu (28 papers)
- Jinhang Zuo (24 papers)
- Siwei Wang (72 papers)
- John C. S. Lui (112 papers)
- Mohammad Hajiesmaili (47 papers)
- Adam Wierman (132 papers)
- Wei Chen (1293 papers)