Why Online Reinforcement Learning is Causal (2403.04221v2)
Abstract: Reinforcement learning (RL) and causal modelling naturally complement each other. The goal of causal modelling is to predict the effects of interventions in an environment, while the goal of reinforcement learning is to select interventions that maximize the rewards the agent receives from the environment. Reinforcement learning includes the two most powerful sources of information for estimating causal relationships: temporal ordering and the ability to act on an environment. This paper examines which reinforcement learning settings we can expect to benefit from causal modelling, and how. In online learning, the agent has the ability to interact directly with their environment, and learn from exploring it. Our main argument is that in online learning, conditional probabilities are causal, and therefore offline RL is the setting where causal learning has the most potential to make a difference. Essentially, the reason is that when an agent learns from their {\em own} experience, there are no unobserved confounders that influence both the agent's own exploratory actions and the rewards they receive. Our paper formalizes this argument. For offline RL, where an agent may and typically does learn from the experience of {\em others}, we describe previous and new methods for leveraging a causal model, including support for counterfactual queries.
- Hindsight experience replay. In Advances in Neural Information Processing Systems, NIPS’17, pp. 5055–5065, Red Hook, NY, USA, 2017. Curran Associates Inc. ISBN 9781510860964.
- Elias Bareinboim. Towards causal reinforcement learning. International Conference on Machine Learning Tutorial, 2020. URL https://crl.causalai.net.
- Weakly supervised causal representation learning. In Advances in Neural Information Processing Systems, 2022.
- Causal discovery from a mixture of experimental and observational data. In Conference on Uncertainty in Artificial Intelligence, UAI’99, pp. 116–125, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc. ISBN 1558606149.
- Causal confusion in imitation learning. Advances in Neural Information Processing Systems, 32, 2019.
- Causal reinforcement learning: A survey. arXiv preprint arXiv:2307.01452, 2023.
- Causality in Bayesian belief networks. In Conference on Uncertainty in Artificial Intelligence, pp. 3–11. Elsevier, 1993.
- Causal reinforcement learning using observational and interventional data. arXiv preprint arXiv:2106.14421, 2021.
- Deep end-to-end causal inference. arXiv preprint arXiv:2202.02195, 2022.
- Action-sufficient state representation learning for control with structural constraints. In International Conference on Machine Learning, pp. 9260–9279. PMLR, 2022.
- Estimation of a structural vector autoregression model using non-Gaussianity. Journal of Machine Learning Research, 11(5), 2010.
- When to trust your model: Model-based policy optimization. Advances in Neural Information Processing Systems, 32, 2019.
- Causal bandits: Learning good interventions via causal inference. Advances in Neural Information Processing Systems, 29, 2016.
- Structural causal bandits: Where to intervene? Advances in Neural Information Processing Systems, 31, 2018.
- Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
- Deconfounding reinforcement learning in observational settings. arXiv preprint arXiv:1812.10576, 2018.
- Sample-efficient reinforcement learning via counterfactual-based data augmentation. arXiv preprint arXiv:2012.09092, 2020.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- Distinguishing cause from effect using observational data: methods and benchmarks. The Journal of Machine Learning Research, 17(1):1103–1204, 2016.
- Near-optimal reinforcement learning in factored MDPs. Advances in Neural Information Processing Systems, 27, 2014.
- J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge university press, 2000.
- Judea Pearl. Theoretical impediments to machine learning with seven sparks from the causal revolution. arXiv preprint arXiv:1801.04016, 2018.
- Elements of causal inference: foundations and learning algorithms. The MIT Press, 2017.
- Interactive dynamic influence diagrams. In Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems, pp. 1–3, 2007.
- Artificial Intelligence: A Modern Approach. Prentice Hall, 2010.
- Towards causal representation learning. arXiv preprint arXiv:2102.11107, 2021.
- Identifying best interventions through online importance sampling. In International Conference on Machine Learning, pp. 3057–3066. PMLR, 2017.
- Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
- Causation, Prediction, and Search. MIT Press, 2000.
- Cause-effect inference in location-scale noise models: Maximum likelihood vs. independence testing. In Advances in Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=P9I2VQv1uC.
- NTS-NOTEARS: Learning nonparametric DBNs with prior knowledge. In International Conference on Artificial Intelligence and Statistics. PMLR, 2023.
- Richard S Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In International Conference on Machine Learning, pp. 216–224. Morgan Kaufmann, 1990.
- Introduction to reinforcement learning. MIT Press Cambridge, 1998.
- Learning and planning in average-reward Markov decision processes. In International Conference on Machine Learning, pp. 10653–10662. PMLR, 2021.
- Provably efficient causal reinforcement learning with confounded observational data. Advances in Neural Information Processing Systems, 34:21164–21175, 2021.
- Causal dynamics learning for task-independent state abstraction. In International Conference on Machine Learning, July 2022.
- Designing optimal dynamic treatment regimes: A causal reinforcement learning approach. In International Conference on Machine Learning, pp. 11012–11022. PMLR, 2020.
- Causal imitation learning with unobserved confounders. Advances in Neural Information Processing Systems, 33:12263–12274, 2020.