Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Contextual Combinatorial Bandits with Probabilistically Triggered Arms (2303.17110v3)

Published 30 Mar 2023 in cs.LG, cs.AI, and stat.ML

Abstract: We study contextual combinatorial bandits with probabilistically triggered arms (C$2$MAB-T) under a variety of smoothness conditions that capture a wide range of applications, such as contextual cascading bandits and contextual influence maximization bandits. Under the triggering probability modulated (TPM) condition, we devise the C$2$-UCB-T algorithm and propose a novel analysis that achieves an $\tilde{O}(d\sqrt{KT})$ regret bound, removing a potentially exponentially large factor $O(1/p_{\min})$, where $d$ is the dimension of contexts, $p_{\min}$ is the minimum positive probability that any arm can be triggered, and batch-size $K$ is the maximum number of arms that can be triggered per round. Under the variance modulated (VM) or triggering probability and variance modulated (TPVM) conditions, we propose a new variance-adaptive algorithm VAC$2$-UCB and derive a regret bound $\tilde{O}(d\sqrt{T})$, which is independent of the batch-size $K$. As a valuable by-product, our analysis technique and variance-adaptive algorithm can be applied to the CMAB-T and C$2$MAB setting, improving existing results there as well. We also include experiments that demonstrate the improved performance of our algorithms compared with benchmark algorithms on synthetic and real-world datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Improved algorithms for linear stochastic bandits. Advances in neural information processing systems, 24, 2011.
  2. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2-3):235–256, 2002.
  3. Bernstein, S. The Theory of Probabilities (Russian). Moscow, 1946.
  4. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning, 5(1):1–122, 2012.
  5. Combinatorial multi-armed bandit: General framework and applications. In International Conference on Machine Learning, pp. 151–159. PMLR, 2013.
  6. Combinatorial multi-armed bandit and its extension to probabilistically triggered arms. The Journal of Machine Learning Research, 17(1):1746–1778, 2016a.
  7. A general framework for estimating graphlet statistics via random walk. Proceedings of the VLDB Endowment, 10(3):253–264, 2016b.
  8. Combinatorial bandits revisited. Advances in neural information processing systems, 28, 2015.
  9. Freedman, D. A. On tail probabilities for martingales. the Annals of Probability, pp.  100–118, 1975.
  10. Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations. IEEE/ACM Transactions on Networking (TON), 20(5):1466–1478, 2012.
  11. Henderson, C. R. Best linear unbiased estimation and prediction under a selection model. Biometrics, pp.  423–447, 1975.
  12. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.  137–146, 2003.
  13. Combinatorial cascading bandits. In Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 1, pp.  1450–1458, 2015a.
  14. Tight regret bounds for stochastic combinatorial semi-bandits. In AISTATS, 2015b.
  15. Bandit algorithms. Cambridge University Press, 2020.
  16. Linear multi-resource allocation with semi-bandit feedback. Advances in Neural Information Processing Systems, 28, 2015.
  17. Contextual combinatorial cascading bandits. In International conference on machine learning, pp. 1245–1253. PMLR, 2016.
  18. Online influence maximization under linear threshold model. Advances in Neural Information Processing Systems, 33:1192–1204, 2020.
  19. Bandit learning in decentralized matching markets. Journal of Machine Learning Research, 22(211):1–34, 2021a.
  20. Multi-layered network exploration via random walks: From offline optimization to online learning. In International Conference on Machine Learning, pp. 7057–7066. PMLR, 2021b.
  21. Batch-size independent regret bounds for combinatorial semi-bandits with probabilistically triggered arms or independent arms. In Advances in Neural Information Processing Systems, 2022.
  22. Batch-size independent regret bounds for the combinatorial multi-armed bandit problem. In Conference on Learning Theory, pp.  2465–2489. PMLR, 2019.
  23. Contextual combinatorial bandit and its application on diversified online recommendation. In Proceedings of the 2014 SIAM International Conference on Data Mining, pp.  461–469. SIAM, 2014.
  24. Robbins, H. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5):527–535, 1952.
  25. Near-optimal regret bounds for contextual combinatorial semi-bandits with linear payoff functions. In Proceedings of the AAAI Conference on Artificial Intelligence, pp.  9791–9798, 2021.
  26. Minimax regret for cascading bandits. In Advances in Neural Information Processing Systems, 2022.
  27. Improving regret bounds for combinatorial semi-bandits with probabilistically triggered arms and its applications. In Advances in Neural Information Processing Systems, pp. 1161–1171, 2017.
  28. Online influence maximization under independent cascade model with semi-bandit feedback. Advances in neural information processing systems, 30, 2017.
  29. Nearly minimax optimal reinforcement learning for linear mixture markov decision processes. In Conference on Learning Theory, pp.  4532–4576. PMLR, 2021.
  30. Cascading bandits for large-scale recommendation problems. arXiv preprint arXiv:1603.05359, 2016.
  31. Online competitive influence maximization. In International Conference on Artificial Intelligence and Statistics, pp.  11472–11502. PMLR, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Xutong Liu (28 papers)
  2. Jinhang Zuo (24 papers)
  3. Siwei Wang (72 papers)
  4. John C. S. Lui (112 papers)
  5. Mohammad Hajiesmaili (47 papers)
  6. Adam Wierman (132 papers)
  7. Wei Chen (1293 papers)
Citations (12)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets