Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation (2306.02747v3)
Abstract: In real-world scenarios, the application of reinforcement learning is significantly challenged by complex non-stationarity. Most existing methods attempt to model changes in the environment explicitly, often requiring impractical prior knowledge of environments. In this paper, we propose a new perspective, positing that non-stationarity can propagate and accumulate through complex causal relationships during state transitions, thereby compounding its sophistication and affecting policy learning. We believe that this challenge can be more effectively addressed by implicitly tracing the causal origin of non-stationarity. To this end, we introduce the Causal-Origin REPresentation (COREP) algorithm. COREP primarily employs a guided updating mechanism to learn a stable graph representation for the state, termed as causal-origin representation. By leveraging this representation, the learned policy exhibits impressive resilience to non-stationarity. We supplement our approach with a theoretical analysis grounded in the causal interpretation for non-stationary reinforcement learning, advocating for the validity of the causal-origin representation. Experimental results further demonstrate the superior performance of COREP over existing methods in tackling non-stationarity problems.
- Continuous adaptation via meta-learning in nonstationary and competitive environments. arXiv preprint arXiv:1710.03641, 2017.
- Minimum-delay adaptation in non-stationary reinforcement learning via online high-confidence change-point detection. arXiv preprint arXiv:2105.09452, 2021.
- Analyzing the expressive power of graph neural networks in a spectral perspective. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
- Optimizing for the future in non-stationary mdps. In International Conference on Machine Learning, pp. 1414–1425. PMLR, 2020.
- David Maxwell Chickering. Optimal structure identification with greedy search. Journal of machine learning research, 3(Nov):507–554, 2002.
- An environment model for nonstationary reinforcement learning. Advances in neural information processing systems, 12, 1999.
- Polyhedral aspects of score equivalence in bayesian network structure learning. Mathematical Programming, 164:285–324, 2017.
- Dealing with non-stationary environments using context detection. In Proceedings of the 23rd international conference on Machine learning, pp. 217–224, 2006.
- Factored adaptation for non-stationary reinforcement learning. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022.
- Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pp. 1126–1135. PMLR, 2017.
- Solving hidden-semi-markov-mode markov decision problems. In Scalable Uncertainty Management: 8th International Conference, SUM 2014, Oxford, UK, September 15-17, 2014. Proceedings 8, pp. 176–189. Springer, 2014.
- Generalized score functions for causal discovery. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 1551–1560, 2018.
- Causal discovery from heterogeneous/nonstationary data. The Journal of Machine Learning Research, 21(1):3482–3534, 2020.
- AdaRL: What, where, and how to adapt in transfer reinforcement learning. In International Conference on Learning Representations, 2022.
- Reinforcement learning: A survey. Journal of artificial intelligence research, 4:237–285, 1996.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Exact bayesian structure discovery in bayesian networks. The Journal of Machine Learning Research, 5:549–573, 2004.
- Steffen L Lauritzen. Graphical models, volume 17. Clarendon Press, 1996.
- A graph placement methodology for fast chip design. Nature, 594(7862):207–212, 2020.
- Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. arXiv preprint arXiv:1803.11347, 2018.
- Sindhu Padakandla. A survey of reinforcement learning algorithms for dynamically varying environments. ACM Computing Surveys (CSUR), 54(6):1–25, 2021.
- Reinforcement learning algorithm for non-stationary environments. Applied Intelligence, 50:3590–3606, 2020.
- Judea Pearl et al. Models, reasoning and inference. Cambridge, UK: Cambridge University Press, 19(2), 2000.
- Meta-reinforcement learning by tracking task non-stationarity. In Zhi-Hua Zhou (ed.), Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pp. 2899–2905. International Joint Conferences on Artificial Intelligence Organization, 8 2021. doi: 10.24963/ijcai.2021/399. Main Track.
- Ancestral graph markov models. The Annals of Statistics, 30(4):962–1030, 2002.
- Kayvan Sadeghi. Stable mixed graphs. Bernoulli, 19(5B):2330–2358, 2013.
- Causal structure discovery from distributions arising from mixtures of dags. In International Conference on Machine Learning, pp. 8336–8345. PMLR, 2020.
- Invariant policy learning: A causal perspective. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- The graph neural network model. IEEE transactions on neural networks, 20(1):61–80, 2008.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- A simple approach for finding the globally optimal bayesian network structure. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, pp. 445–452, 2006.
- A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018.
- Block contextual mdps for continual learning. In Learning for Dynamics and Control Conference, pp. 608–623. PMLR, 2022.
- Causal inference in the presence of latent variables and selection bias. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pp. 499–506, 1995.
- Causation, prediction, and search. MIT press, 2000.
- Eric V Strobl. Improved causal discovery from longitudinal data using a mixture of dags. In The 2019 ACM SIGKDD Workshop on Causal Discovery, pp. 100–133. PMLR, 2019.
- Reinforcement learning: An introduction. MIT press, 2018.
- On the role of tracking in stationary environments. In Proceedings of the 24th international conference on Machine learning, pp. 871–878, 2007.
- Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
- D’ya like dags? a survey on structure learning and causal discovery. ACM Computing Surveys, 55(4):1–36, 2022.
- Deep reinforcement learning amidst lifelong non-stationarity. arXiv preprint arXiv:2006.10701, 2020.
- Dag-gnn: Dag structure learning with graph neural networks. In International Conference on Machine Learning, pp. 7154–7163. PMLR, 2019.
- Jiji Zhang. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence, 172(16-17):1873–1896, 2008.
- Domain adaptation as a problem of inference on graphical models. Advances in neural information processing systems, 33:4965–4976, 2020.
- Dags with no tears: Continuous optimization for structure learning. Advances in neural information processing systems, 31, 2018.
- Graph neural networks: A review of methods and applications. AI open, 1:57–81, 2020.
- Varibad: A very good method for bayes-adaptive deep rl via meta-learning. arXiv preprint arXiv:1910.08348, 2019.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.