Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov Decision Processes (2401.01841v3)
Abstract: A fundamental (and largely open) challenge in sequential decision-making is dealing with non-stationary environments, where exogenous environmental conditions change over time. Such problems are traditionally modeled as non-stationary Markov decision processes (NSMDP). However, existing approaches for decision-making in NSMDPs have two major shortcomings: first, they assume that the updated environmental dynamics at the current time are known (although future dynamics can change); and second, planning is largely pessimistic, i.e., the agent acts safely'' to account for the non-stationary evolution of the environment. We argue that both these assumptions are invalid in practice -- updated environmental conditions are rarely known, and as the agent interacts with the environment, it can learn about the updated dynamics and avoid being pessimistic, at least in states whose dynamics it is confident about. We present a heuristic search algorithm called \textit{Adaptive Monte Carlo Tree Search (ADA-MCTS)} that addresses these challenges. We show that the agent can learn the updated dynamics of the environment over time and then act as it learns, i.e., if the agent is in a region of the state space about which it has updated knowledge, it can avoid being pessimistic. To quantify
updated knowledge,'' we disintegrate the aleatoric and epistemic uncertainty in the agent's updated belief and show how the agent can use these estimates for decision-making. We compare the proposed approach with the multiple state-of-the-art approaches in decision-making across multiple well-established open-source problems and empirically show that our approach is faster and highly adaptive without sacrificing safety.
- Szilárd Aradi. Survey of deep reinforcement learning for motion planning of autonomous vehicles. IEEE Transactions on Intelligent Transportation Systems, 23(2):740–759, 2020.
- A review of incident prediction, resource allocation, and dispatch models for emergency management. Accident Analysis & Prevention, 165:106501, 2022.
- Safety-assured speculative planning with adaptive prediction. CoRR, abs/2307.11876, 2023. doi: 10.48550/ARXIV.2307.11876. URL https://doi.org/10.48550/arXiv.2307.11876.
- Dynamic simplex: Balancing safety and performance in autonomous cyber physical systems. In Sayan Mitra, Nalini Venkatasubramanian, Abhishek Dubey, Lu Feng, Mahsa Ghasemi, and Jonathan Sprinkle, editors, Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems, ICCPS 2023, (with CPS-IoT Week 2023), San Antonio, TX, USA, May 9-12, 2023, pages 177–186. ACM, 2023. doi: 10.1145/3576841.3585934. URL https://doi.org/10.1145/3576841.3585934.
- Markovian decision processes with uncertain transition probabilities. Operations Research, 21(3):728–740, 1973.
- Markov decision processes with imprecise transition probabilities. Operations Research, 42(4):739–749, 1994.
- Safety-assured design and adaptation of learning-enabled autonomous systems. In ASPDAC ’21: 26th Asia and South Pacific Design Automation Conference, Tokyo, Japan, January 18-21, 2021, pages 753–760. ACM, 2021. doi: 10.1145/3394885.3431623. URL https://doi.org/10.1145/3394885.3431623.
- Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 36593–36604. PMLR, 2023. URL https://proceedings.mlr.press/v202/wang23as.html.
- Lifelong robot learning. Robotics and autonomous systems, 15(1-2):25–46, 1995.
- Non-stationary markov decision processes, a worst-case approach using model-based reinforcement learning. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 7214–7223, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/859b00aec8885efc83d1541b52a1220d-Abstract.html.
- Planning in stochastic environments with a learned model. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. URL https://openreview.net/forum?id=X6D9bAHhBQ1.
- Algorithms for decision making. MIT press, 2022.
- Bootstrapping: A nonparametric approach to statistical inference. Number 95. sage, 1993.
- Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518):859–877, 2017.
- Bayesian approach for neural networks—review and case studies. Neural networks, 14(3):257–274, 2001.
- What uncertainties do we need in bayesian deep learning for computer vision? In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 5574–5584, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/2650d6089a6d640c5e85b2b88265dc2b-Abstract.html.
- Robust and efficient transfer learning with hidden parameter markov decision processes. In Satinder Singh and Shaul Markovitch, editors, Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA, pages 4949–4950. AAAI Press, 2017. doi: 10.1609/aaai.v31i1.11065. URL https://doi.org/10.1609/aaai.v31i1.11065.
- OpenAI Gym. arXiv preprint arXiv:1606.01540, 2016.
- Garud N Iyengar. Robust dynamic programming. Mathematics of Operations Research, 30(2):257–280, 2005.