Robust Perception-Informed Navigation using PAC-NMPC with a Learned Value Function (2309.13171v2)
Abstract: Nonlinear model predictive control (NMPC) is typically restricted to short, finite horizons to limit the computational burden of online optimization. As a result, global planning frameworks are frequently necessary to avoid local minima when using NMPC for navigation in complex environments. By contrast, reinforcement learning (RL) can generate policies that minimize the expected cost over an infinite-horizon and can often avoid local minima, even when operating only on current sensor measurements. However, these learned policies are usually unable to provide performance guarantees (e.g., on collision avoidance), especially when outside of the training distribution. In this paper, we augment Probably Approximately Correct NMPC (PAC-NMPC), a sampling-based stochastic NMPC algorithm capable of providing statistical guarantees of performance and safety, with an approximate perception-dependent value function trained via RL. We demonstrate in simulation that our algorithm can improve the long-term behavior of PAC-NMPC while outperforming other approaches with regards to safety for both planar car dynamics and more complex, high-dimensional fixed-wing aerial vehicle dynamics. We also demonstrate that, even when our value function is trained in simulation, our algorithm can successfully achieve statistically safe navigation on hardware using a 1/10th scale rally car in cluttered real-world environments using only current sensor information.
- A. Polevoy, M. Basescu, L. Scheuer, and J. Moore, “Post-stall navigation with fixed-wing uavs using onboard vision,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 9696–9702.
- A. Polevoy, C. Knuth, K. M. Popek, and K. D. Katyal, “Complex terrain navigation via model error prediction,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 9411–9417.
- D. Falanga, P. Foehn, P. Lu, and D. Scaramuzza, “Pampc: Perception-aware model predictive control for quadrotors,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018, pp. 1–8.
- K. Lee, J. Gibson, and E. A. Theodorou, “Aggressive perception-aware navigation using deep optical flow dynamics and pixelmpc,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 1207–1214, 2020.
- B. Brito, B. Floor, L. Ferranti, and J. Alonso-Mora, “Model predictive contouring control for collision avoidance in unstructured dynamic environments,” IEEE Robotics and Automation Letters, vol. 4, no. 4, pp. 4459–4466, 2019.
- Z. Jian, Z. Yan, X. Lei, Z. Lu, B. Lan, X. Wang, and B. Liang, “Dynamic control barrier function-based model predictive control to safety-critical obstacle-avoidance of mobile robot,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 3679–3685.
- A. Polevoy, M. Kobilarov, and J. Moore, “Probably approximately correct nonlinear model predictive control (pac-nmpc),” IEEE Robotics and Automation Letters, pp. 1–8, 2023.
- M. Kobilarov, “Sample complexity bounds for iterative stochastic policy optimization,” Advances in Neural Information Processing Systems, vol. 28, 2015.
- S. Sharma and M. E. Taylor, “Autonomous waypoint generation strategy for on-line navigation in unknown environments,” environment, vol. 2, p. 3D, 2012.
- E. Kaufmann, A. Loquercio, R. Ranftl, A. Dosovitskiy, V. Koltun, and D. Scaramuzza, “Deep drone racing: Learning agile flight in dynamic environments,” in Conference on Robot Learning. PMLR, 2018, pp. 133–145.
- C. Greatwood and A. G. Richards, “Reinforcement learning and model predictive control for robust embedded quadrotor guidance and control,” Autonomous Robots, vol. 43, pp. 1681–1693, 2019.
- S. Bansal, V. Tolani, S. Gupta, J. Malik, and C. Tomlin, “Combining optimal control and learning for visual navigation in novel environments,” in Conference on Robot Learning. PMLR, 2020, pp. 420–429.
- B. Brito, M. Everett, J. P. How, and J. Alonso-Mora, “Where to go next: Learning a subgoal recommendation policy for navigation in dynamic environments,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4616–4623, 2021.
- H. Chen and F. Allgöwer, “A quasi-infinite horizon nonlinear model predictive control scheme with guaranteed stability,” Automatica, vol. 34, no. 10, pp. 1205–1217, 1998.
- G. Pannocchia, S. J. Wright, and J. B. Rawlings, “Existence and computation of infinite horizon model predictive control with active steady-state input constraints,” IEEE Transactions on Automatic Control, vol. 48, no. 6, pp. 1002–1006, 2003.
- F. de Almeida, “Waypoint navigation using constrained infinite horizon model predictive control,” in AIAA Guidance, Navigation and Control Conference and Exhibit, 2008, p. 6462.
- T. Erez, Y. Tassa, and E. Todorov, “Infinite-horizon model predictive control for periodic tasks with contacts,” Robotics: Science and systems VII, p. 73, 2012.
- M. Zhong, M. Johnson, Y. Tassa, T. Erez, and E. Todorov, “Value function approximation and model predictive control,” in 2013 IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL). IEEE, 2013, pp. 100–107.
- K. Lowrey, A. Rajeswaran, S. Kakade, E. Todorov, and I. Mordatch, “Plan online, learn offline: Efficient learning and exploration via model-based control,” arXiv preprint arXiv:1811.01848, 2018.
- D. Hoeller, F. Farshidian, and M. Hutter, “Deep value model predictive control,” in Conference on Robot Learning. PMLR, 2020, pp. 990–1004.
- Z.-W. Hong, J. Pajarinen, and J. Peters, “Model-based lookahead reinforcement learning,” arXiv preprint arXiv:1908.06012, 2019.
- M. Bhardwaj, A. Handa, D. Fox, and B. Boots, “Information theoretic model predictive q-learning,” in Learning for Dynamics and Control. PMLR, 2020, pp. 840–850.
- M. Bhardwaj, S. Choudhury, and B. Boots, “Blending mpc & value function approximation for efficient reinforcement learning,” arXiv preprint arXiv:2012.05909, 2020.
- S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” in International conference on machine learning. PMLR, 2018, pp. 1587–1596.