Contingency Planning Using Bi-level Markov Decision Processes for Space Missions (2402.16342v1)
Abstract: This work focuses on autonomous contingency planning for scientific missions by enabling rapid policy computation from any off-nominal point in the state space in the event of a delay or deviation from the nominal mission plan. Successful contingency planning involves managing risks and rewards, often probabilistically associated with actions, in stochastic scenarios. Markov Decision Processes (MDPs) are used to mathematically model decision-making in such scenarios. However, in the specific case of planetary rover traverse planning, the vast action space and long planning time horizon pose computational challenges. A bi-level MDP framework is proposed to improve computational tractability, while also aligning with existing mission planning practices and enhancing explainability and trustworthiness of AI-driven solutions. We discuss the conversion of a mission planning MDP into a bi-level MDP, and test the framework on RoverGridWorld, a modified GridWorld environment for rover mission planning. We demonstrate the computational tractability and near-optimal policies achievable with the bi-level MDP approach, highlighting the trade-offs between compute time and policy optimality as the problem's complexity grows. This work facilitates more efficient and flexible contingency planning in the context of scientific missions.
- A. Colaprete, D. Andrews, W. Bluethmann, R. C. Elphic, B. Bussey, J. Trimble, K. Zacny, and J. E. Captain, “An overview of the volatiles investigating polar exploration rover (viper) mission,” AGU Fall Meeting Abstracts, vol. 2019, pp. P34B–03, 2019.
- J. L. Heldmann, A. Colaprete, R. C. Elphic, B. Bussey, A. McGovern, R. Beyer, D. Lees, and M. Deans, “Site selection and traverse planning to support a lunar polar rover mission: A case study at haworth crater,” Acta Astronautica, vol. 127, pp. 308–320, 2016.
- R. Bellman, “A markovian decision process,” Journal of Mathematics and Mechanics, vol. 6, no. 5, pp. 679–684, 1957.
- W. H. Al-Sabban, L. F. Gonzalez, and R. N. Smith, “Wind-energy based path planning for unmanned aerial vehicles using markov decision processes,” in Proc. IEEE Conf. on Robotics and Automation, 2013.
- R. Allamaraju, H. Kingravi, A. Axelrod, G. Chowdhary, R. Grande, J. P. How, C. Crick, and W. Sheng, “Human aware uas path planning in urban environments using nonstationary mdps,” in Proc. IEEE Conf. on Robotics and Automation, 2014.
- R. Chowdhury and D. Subramani, “Optimal path planning of autonomous marine vehicles in stochastic dynamic ocean flows using a gpu-accelerated algorithm,” IEEE Journal of Oceanic Engineering, vol. 47, no. 4, pp. 864–879, 2022.
- D. Kularatne, H. Hajieghary, and M. A. Hsieh, “Optimal path planning in time-varying flows with forecasting uncertainties,” in Proc. IEEE Conf. on Robotics and Automation, 2018.
- L. Nardi and C. Stachniss, “Uncertainty-aware path planning for navigation on road networks using augmented mdps,” in Proc. IEEE Conf. on Robotics and Automation, 2019.
- X. Tan, P. Yu, K.-B. Lim, and C.-K. Chui, “Robust path planning for flexible needle insertion using markov decision processes,” International Journal of Computer Assisted Radiology and Surgery, vol. 13, pp. 1439–1451, 2018.
- M. Ono, B. Rothrock, K. Otsu, S. Higa, Y. Iwashita, A. Didier, T. Islam, C. Laporte, V. Sun, K. Stack, J. Sawoniewicz, S. Daftry, V. Timmaraju, S. Sahnoune, C. A. Mattmann, O. Lamarre, S. Ghosh, D. Qiu, S. Nomura, H. Roy, H. Sarabu, G. Hedrick, L. Folsom, S. Suehr, and H. Park, “Maars: Machine learning-based analytics for automated rover systems,” in IEEE Aerospace Conference, 2020.
- X. Yu, P. Wang, and Z. Zhang, “Learning-based end-to-end path planning for lunar rovers with safety constraints,” Sensors, vol. 21, no. 3, p. 796, 2021.
- L. Folsom, M. Ono, K. Otsu, and H. Park, “Scalable information-theoretic path planning for a rover-helicopter team in uncertain environments,” Int. Journal of Advanced Robotic Systems, vol. 18, no. 2, p. 1729881421999587, 2021.
- T. G. Dietterich, “Hierarchical reinforcement learning with the MAXQ value function decomposition,” Journal of Artificial Intelligence Research, vol. 13, pp. 227–303, 2000.
- B. Bakker, Z. Zivkovic, and B. Krose, “Hierarchical dynamic programming for robot path planning,” in IEEE/RSJ Int. Conf. on Intelligent Robots & Systems, 2005.
- T. Long, X.-T. Ma, and Q.-S. Jia, “Bi-level proximal policy optimization for stochastic coordination of ev charging load with uncertain wind power,” in IEEE Conf. Control Technology and Applications, 2019.
- M. Ai-Chang, J. Bresina, L. Charest, A. Chase, J.-J. Hsu, A. Jonsson, B. Kanefsky, P. Morris, K. Rajan, J. Yglesias, B. G. Chafin, W. C. Dias, and P. F. Maldague, “Mapgen: mixed-initiative planning and scheduling for the mars exploration rover mission,” IEEE Intelligent Systems, vol. 19, no. 1, pp. 8–12, 2004.
- T. Estlin, F. Fisher, D. Gaines, C. Chouinard, S. Schaffer, and I. Nesnas, “Continuous planning and execution for an autonomous rover,” in Proc. of the Third International NASA Workshop on Planning and Scheduling for Space, 2002.
- J. Boyan and M. Littman, “Exact solutions to time-dependent mdps,” in Conf. on Neural Information Processing Systems, 2000.
- T. M. Moldovan and P. Abbeel. (2012) Safe exploration in markov decision processes. Available at https://arxiv.org/abs/1611.02779.