Causal Contextual Bandits with Adaptive Context (2405.18626v2)
Abstract: We study a variant of causal contextual bandits where the context is chosen based on an initial intervention chosen by the learner. At the beginning of each round, the learner selects an initial action, depending on which a stochastic context is revealed by the environment. Following this, the learner then selects a final action and receives a reward. Given $T$ rounds of interactions with the environment, the objective of the learner is to learn a policy (of selecting the initial and the final action) with maximum expected reward. In this paper we study the specific situation where every action corresponds to intervening on a node in some known causal graph. We extend prior work from the deterministic context setting to obtain simple regret minimization guarantees. This is achieved through an instance-dependent causal parameter, $\lambda$, which characterizes our upper bound. Furthermore, we prove that our simple regret is essentially tight for a large class of instances. A key feature of our work is that we use convex optimization to address the bandit exploration problem. We also conduct experiments to validate our theoretical results, and release our code at our project GitHub repository: https://github.com/adaptiveContextualCausalBandits/aCCB.
- Learning and testing causal models with interventions. Advances in Neural Information Processing Systems, 31, 2018.
- Analysis of thompson sampling for the multi-armed bandit problem. In Conference on learning theory, pp. 39–1. JMLR Workshop and Conference Proceedings, 2012.
- Towards characterizing markov equivalence classes for directed acyclic graphs with latent variables. In Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence, pp. 10–17, 2005.
- Best arm identification in multi-armed bandits. In COLT, pp. 41–53, 2010.
- Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of the 36th Annual Symposium on Foundations of Computer Science, FOCS ’95, pp. 322, USA, 1995. IEEE Computer Society. ISBN 0818671831.
- Ucb revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Periodica Mathematica Hungarica, 61(1-2):55–65, 2010.
- Finite-time analysis of the multiarmed bandit problem. Mach. Learn., 47(2–3):235–256, may 2002. ISSN 0885-6125. doi: 10.1023/A:1013689704352. URL https://doi.org/10.1023/A:1013689704352.
- Minimax regret bounds for reinforcement learning, 2017.
- Survey on applications of multi-armed and contextual bandits. In 2020 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8. IEEE, 2020.
- Pure exploration in multi-armed bandits problems. In Algorithmic Learning Theory: 20th International Conference, ALT 2009, Porto, Portugal, October 3-5, 2009. Proceedings 20, pp. 23–37. Springer, 2009.
- Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning, 5(1):1–122, 2012.
- Testing bayesian networks. In Conference on Learning Theory, pp. 370–448. PMLR, 2017.
- Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, USA, 2006. ISBN 0471241954.
- Testing ising models. IEEE Transactions on Information Theory, 65(11):6829–6852, 2019.
- Luc Devroye. The equivalence of weak, strong and complete convergence in l1 for kernel density estimates. The Annals of Statistics, 11(3):896–904, 1983. ISSN 00905364. URL http://www.jstor.org/stable/2240651.
- CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1–5, 2016.
- On the number of experiments sufficient and in the worst case necessary to identify all causal relations among n variables. In Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence, pp. 178–184, 2005.
- Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. Journal of machine learning research, 7(6), 2006.
- Combinatorial causal bandits. Proceedings of the AAAI Conference on Artificial Intelligence, 37(6):7550–7558, Jun. 2023. doi: 10.1609/aaai.v37i6.25917. URL https://ojs.aaai.org/index.php/AAAI/article/view/25917.
- Two optimal strategies for active learning of causal models from interventional data. International Journal of Approximate Reasoning, 55(4):926–939, 2014.
- Confounded budgeted causal bandits. arXiv preprint arXiv:2401.07578, 2024.
- Inequality constraints in causal models with hidden variables. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, UAI’06, pp. 233–240, Arlington, Virginia, USA, 2006. AUAI Press. ISBN 0974903922.
- Cost-optimal learning of causal graphs. In International Conference on Machine Learning, pp. 1875–1884. PMLR, 2017a.
- Experimental design for learning causal graphs with latent variables. Advances in Neural Information Processing Systems, 30, 2017b.
- Algorithms for multi-armed bandit problems. arXiv preprint arXiv:1402.6028, 2014.
- Causal bandits: Learning good interventions via causal inference. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pp. 1189–1197, Red Hook, NY, USA, 2016. Curran Associates Inc. ISBN 9781510838819.
- Bandit algorithms. Cambridge University Press, 2020.
- Bandit Algorithms. Cambridge University Press, 2020. doi: 10.1017/9781108571401.
- Structural causal bandits: Where to intervene? In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper/2018/file/c0a271bc0ecb776a094786474322cb82-Paper.pdf.
- Structural causal bandits with non-manipulable variables. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):4164–4172, Jul. 2019. doi: 10.1609/aaai.v33i01.33014164. URL https://ojs.aaai.org/index.php/AAAI/article/view/4320.
- Regret analysis of bandit problems with causal background knowledge. In Conference on Uncertainty in Artificial Intelligence, pp. 141–150. PMLR, 2020.
- Causal bandits with unknown graph structure. Advances in Neural Information Processing Systems, 34:24817–24828, 2021.
- Efficient reinforcement learning with prior causal knowledge. In Conference on Causal Learning and Reasoning, pp. 526–541. PMLR, 2022.
- A causal bandit approach to learning good atomic interventions in presence of unobserved confounders. In Uncertainty in Artificial Intelligence, pp. 1328–1338. PMLR, 2022.
- Additive causal bandits with unknown graph. In International Conference on Machine Learning, pp. 23574–23589. PMLR, 2023.
- The sample complexity of exploration in the multi-armed bandit problem. Journal of Machine Learning Research, 5(Jun):623–648, 2004.
- Budgeted and non-budgeted causal bandits. In Arindam Banerjee and Kenji Fukumizu (eds.), The 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021, April 13-15, 2021, Virtual Event, volume 130 of Proceedings of Machine Learning Research, pp. 2017–2025. PMLR, 2021. URL http://proceedings.mlr.press/v130/nair21a.html.
- Beyond logarithmic bounds in online learning. In Artificial intelligence and statistics, pp. 823–831. PMLR, 2012.
- On lower bounds for regret in reinforcement learning. arXiv preprint arXiv:1608.02732, 2016.
- A theory of inferred causation. In Studies in Logic and the Foundations of Mathematics, volume 134, pp. 789–811. Elsevier, 1995.
- Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press, 2017. ISBN 0262037319.
- Identifying best interventions through online importance sampling. In International Conference on Machine Learning, pp. 3057–3066. PMLR, 2017a.
- Contextual bandits with latent confounders: An nmf approach. In Artificial Intelligence and Statistics, pp. 518–527. PMLR, 2017b.
- Learning causal graphs with small interventions. Advances in Neural Information Processing Systems, 28, 2015.
- Aleksandrs Slivkins et al. Introduction to multi-armed bandits. Foundations and Trends® in Machine Learning, 12(1-2):1–286, 2019.
- Causation, prediction, and search. MIT press, 2000.
- Causal contextual bandits with targeted interventions. In International Conference on Learning Representations, 2022.
- On the testable implications of causal models with hidden variables. In Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence, pp. 519–527, 2002.
- Causal bandits for linear structural equation models. arXiv preprint arXiv:2208.12764, 2022.
- Approximate allocation matching for structural causal bandits with unobserved confounders. Advances in Neural Information Processing Systems, 36, 2024.
- Combinatorial pure exploration of causal bandits. In International Conference on Learning Representations, 2023.
- Causal bandits with propagating inference. In International Conference on Machine Learning, pp. 5512–5520. PMLR, 2018.
- Causal bandits with general causal models and interventions. arXiv preprint arXiv:2403.00233, 2024.
- Jiji Zhang. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence, 172(16-17):1873–1896, 2008.
- Junzhe Zhang. Designing optimal dynamic treatment regimes: A causal reinforcement learning approach. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp. 11012–11022. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/zhang20a.html.