Causal Bandits with General Causal Models and Interventions (2403.00233v1)
Abstract: This paper considers causal bandits (CBs) for the sequential design of interventions in a causal system. The objective is to optimize a reward function via minimizing a measure of cumulative regret with respect to the best sequence of interventions in hindsight. The paper advances the results on CBs in three directions. First, the structural causal models (SCMs) are assumed to be unknown and drawn arbitrarily from a general class $\mathcal{F}$ of Lipschitz-continuous functions. Existing results are often focused on (generalized) linear SCMs. Second, the interventions are assumed to be generalized soft with any desired level of granularity, resulting in an infinite number of possible interventions. The existing literature, in contrast, generally adopts atomic and hard interventions. Third, we provide general upper and lower bounds on regret. The upper bounds subsume (and improve) known bounds for special cases. The lower bounds are generally hitherto unknown. These bounds are characterized as functions of the (i) graph parameters, (ii) eluder dimension of the space of SCMs, denoted by $\operatorname{dim}(\mathcal{F})$, and (iii) the covering number of the function space, denoted by ${\rm cn}(\mathcal{F})$. Specifically, the cumulative achievable regret over horizon $T$ is $\mathcal{O}(K d{L-1}\sqrt{T\operatorname{dim}(\mathcal{F}) \log({\rm cn}(\mathcal{F}))})$, where $K$ is related to the Lipschitz constants, $d$ is the graph's maximum in-degree, and $L$ is the length of the longest causal path. The upper bound is further refined for special classes of SCMs (neural network, polynomial, and linear), and their corresponding lower bounds are provided.
- Improved algorithms for linear stochastic bandits. In Proc. Advances in Neural Information Processing Systems, Granada, Spain.
- Agrawal, R. (1995). Sample mean based index policies by o(logn)𝑜𝑛o(\log n)italic_o ( roman_log italic_n ) regret for the multi-armed bandit problem. Advances in Applied Probability, 27(4):1054–1078.
- Causalworld: A robotic manipulation benchmark for causal structure and transfer learning. In Proc. International Conference on Learning Representations, virtual.
- Learning causal biological networks with the principle of mendelian randomization. Frontiers in Genetics, 10.
- Bandits with unobserved confounders: A causal approach. In Proc. Advances in Neural Information Processing Systems, Montréal, Canada.
- Adaptively exploiting d𝑑ditalic_d-separators with causal bandits. In Proc. Advances in Neural Information Processing Systems, New Orleans, LA.
- Causal bandits without prior knowledge using separating sets. In Proc. Conference on Causal Learning and Reasoning, Eureka, CA.
- Combinatorial causal bandits. In Proc. the AAAI Conference on Artificial Intelligence, Washington, D.C.
- Combinatorial causal bandits without graph skeleton. arXiv:2301.13392.
- A short note on the relationship of information gain and eluder dimension. arXiv:2107.02377.
- Causal discovery from soft interventions with unknown targets: Characterization and learning. In Proc. Advances in Neural Information Processing Systems, Virtual.
- Scalable generalized linear bandits: Online computation and hashing. Advances in Neural Information Processing Systems, 30.
- Causal bandits without graph learning. arXiv:2301.11401.
- Randomized exploration in generalized linear bandits. In International Conference on Artificial Intelligence and Statistics, pages 2066–2076.
- Causal bandits: Learning good interventions via causal inference. In Proc. Advances in Neural Information Processing Systems, Barcelona, Spain.
- Bandit Algorithms. Cambridge University Press, Cambridge, UK.
- Understanding the eluder dimension. Proc. Advances in Neural Information Processing Systems, 35.
- Reinforcement learning for clinical decision support in critical care: Comprehensive review. Journal of Medical Internet Research, 22(7).
- Causal bandits with unknown graph structure. In Proc. Advances in Neural Information Processing Systems, virtual.
- Regret analysis of bandit problems with causal background knowledge. In Proc. Conference on Uncertainty in Artificial Intelligence, virtual.
- A causal bandit approach to learning good atomic interventions in presence of unobserved confounders. In Proc. Conference on Uncertainty in Artificial Intelligence, Eindhoven, Netherlands.
- Additive causal bandits with unknown graph. In Proc. International Conference on Machine Learning, Honolulu, Hawaii.
- Budgeted and non-budgeted causal bandits. In Proc. International Conference on Artificial Intelligence and Statistics, virtual.
- Model-based reinforcement learning and the eluder dimension. In Proc. Advances in Neural Information Processing Systems, Montréal, Canada.
- Eluder dimension and the sample complexity of optimistic exploration. In Proc. Advances in Neural Information Processing Systems, Stateline, NV.
- Learning good interventions in causal graphs via covering. In Proc. Conference on Uncertainty in Artificial Intelligence, Pittsburgh, PA.
- Identifying best interventions through online importance sampling. In Proc. International Conference on Machine Learning, Sydney, Australia.
- Gaussian process optimization in the bandit setting: No regret and experimental design. In Proc. International Conference on Machine Learning, Haifa, Israel.
- Model-based causal bayesian optimization. In Proc. International Conference on Learning Representations, Kigali, Rwanda.
- Causal bandits for linear structural equation models. The Journal of Machine Learning Research, 24(297):1–59.
- Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press, Cambridge, UK.
- Combinatorial pure exploration of causal bandits. In Proc. International Conference on Learning Representations, Kigali, Rwanda.
- Causal bandits with propagating inference. In Proc. International Conference on Machine Learning, Stockholm, Sweden.
- On function approximation in reinforcement learning: Optimism in the face of large state spaces. In Proc. Advances in Neural Information Processing Systems, Virtual.
- Mitigating targeting bias in content recommendation with causal bandits. In Proc. ACM Conference on Recommender Systems Workshop on Multi-Objective Recommender Systems, Seattle, WA.