Kernel-based diffusion approximated Markov decision processes for autonomous navigation and control on unstructured terrains (2111.08748v3)
Abstract: We propose a diffusion approximation method to the continuous-state Markov Decision Processes (MDPs) that can be utilized to address autonomous navigation and control in unstructured off-road environments. In contrast to most decision-theoretic planning frameworks that assume fully known state transition models, we design a method that eliminates such a strong assumption that is often extremely difficult to engineer in reality. We first take the second-order Taylor expansion of the value function. The BeLLMan optimality equation is then approximated by a partial differential equation, which only relies on the first and second moments of the transition model. By combining the kernel representation of the value function, we design an efficient policy iteration algorithm whose policy evaluation step can be represented as a linear system of equations characterized by a finite set of supporting states. We first validate the proposed method through extensive simulations in 2D obstacle avoidance and 2.5D terrain navigation problems. The results show that the proposed approach leads to a much superior performance over several baselines. We then develop a system that integrates our decision-making framework with onboard perception and conduct real-world experiments in both cluttered indoor and unstructured outdoor environments. The results from the physical systems further demonstrate the applicability of our method in challenging real-world environments.
- Firm: Sampling-based feedback motion-planning under motion uncertainty and imperfect measurements. The International Journal of Robotics Research, 33(2):268–304, 2014.
- Wind-energy based path planning for unmanned aerial vehicles using markov decision processes. In 2013 IEEE International Conference on Robotics and Automation (ICRA), pages 784–789. IEEE, 2013.
- Nonlinear model predictive control, volume 26. Birkhäuser, 2012.
- Reachability analysis of nonlinear systems with uncertain parameters using conservative linearization. In 2008 47th IEEE Conference on Decision and Control, pages 4042–4048. IEEE, 2008.
- Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71(1):89–129, 2008.
- A brief survey of deep reinforcement learning. arXiv preprint arXiv:1708.05866, 2017.
- Optimal path planning of a target-following fixed-wing uav using sequential decision processes. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2955–2962. IEEE, 2013.
- Hamilton-jacobi reachability: A brief overview and recent advances. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pages 2242–2253. IEEE, 2017.
- Richard E Bellman. Adaptive control processes: a guided tour, volume 2045. Princeton university press, 2015.
- Robust model predictive control: A survey. In Robustness in identification and control, pages 207–226. Springer, 1999.
- Dimitri Bertsekas. Dynamic programming and optimal control: Volume I, volume 1. Athena scientific, 2012.
- Dimitri P Bertsekas. Dynamic programming and suboptimal control: A survey from adp to mpc. European Journal of Control, 11(4-5):310–334, 2005.
- Dimitri P Bertsekas. Approximate policy iteration: A survey and some new methods. Journal of Control Theory and Applications, 9(3):310–335, 2011.
- Neuro-dynamic programming, volume 5. Athena Scientific Belmont, MA, 1996.
- Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11:1–94, 1999.
- On the taylor expansion of value functions. Operations Research, 68(2):631–654, 2020.
- Simultaneous localization and mapping: A survey of current trends in autonomous driving. IEEE Transactions on Intelligent Vehicles, 2(3):194–220, 2017.
- Coordinated multi-robot exploration. IEEE Transactions on robotics, 21(3):376–386, 2005.
- Real-time safe trajectory generation for quadrotor flight in cluttered environments. In 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO), pages 1678–1685. IEEE, 2015.
- Ma3: Model-accuracy aware anytime planning with simulation verification for navigating complex terrains. In Proceedings of the International Symposium on Combinatorial Search, vol. 15, no. 1, pages 65–73. 2022.
- Gaussian process dynamic programming. Neurocomputing, 72(7-9):1508–1524, 2009.
- Efficient mixed-integer planning for uavs in cluttered environments. In 2015 IEEE international conference on robotics and automation (ICRA), pages 42–49. IEEE, 2015.
- Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks, 107:3–11, 2018.
- Bayes meets bellman: The gaussian process approach to temporal difference learning. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), pages 154–161, 2003.
- Lawrence C. Evans. Partial Differential Equations: Second Edition (Graduate Series in Mathematics). American Mathematical Society, 2010.
- Probabilistic terrain mapping for mobile robots with uncertain localization. IEEE Robotics and Automation Letters, 3(4):3019–3026, 2018.
- Sense and collision avoidance of unmanned aerial vehicles using markov decision process and flatness approach. In 2015 IEEE International Conference on Information and Automation, pages 714–719. IEEE, 2015.
- Asymptotically optimal sampling-based motion planning methods. Annual Review of Control, Robotics, and Autonomous Systems, 4.
- Online quadrotor trajectory generation and autonomous navigation on point clouds. In 2016 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), pages 139–146. IEEE, 2016.
- Online safe trajectory generation for quadrotors using fast marching method and bernstein basis polynomial. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 344–351. IEEE, 2018.
- Flying on point clouds: Online trajectory generation and autonomous navigation for quadrotors in cluttered environments. Journal of Field Robotics, 36(4):710–733, 2019.
- Geoffrey J Gordon. Approximate solutions to markov decision processes. Technical report, Carnegie-Mellon University School of Computer Science, 1999.
- Efficient high-dimensional stochastic optimal motion control using tensor-train decomposition. In Robotics: Science and Systems, 2015.
- Application of multi-robot systems to disaster-relief scenarios with limited communication. In Field and Service Robotics, pages 639–653. Springer, 2016.
- A tutorial on graph-based slam. IEEE Intelligent Transportation Systems Magazine, 2(4):31–43, 2010.
- Learning continuous control policies by stochastic value gradients. In Advances in Neural Information Processing Systems, pages 2944–2952, 2015.
- Kernel methods in machine learning. The annals of statistics, pages 1171–1220, 2008.
- Optimal rough terrain trajectory generation for wheeled mobile robots. The International Journal of Robotics Research, 26(2):141–166, 2007.
- An incremental sampling-based algorithm for stochastic optimal control. The International Journal of Robotics Research, 35(4):305–333, 2016.
- Reproducibility of benchmarked deep reinforcement learning tasks for continuous control. arXiv preprint arXiv:1708.04133, 2017.
- S.G. Johnson. The NLopt Nonlinear-Optimization Package. http://ab-initio.mit.edu/nlopt.
- Rudolf Emil Kalman et al. Contributions to the theory of optimal control. Bol. soc. mat. mexicana, 5(2):102–119, 1960.
- Optimal control as a graphical model inference problem. Machine learning, 87(2):159–182, 2012.
- Sampling-based algorithms for optimal motion planning. The international journal of robotics research, 30(7):846–894, 2011.
- Champion-level drone racing using deep reinforcement learning. Nature, 620(7976):982–987, 2023. Nature Publishing Group UK London.
- Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, 2013.
- Gaussian processes in reinforcement learning. In Advances in Neural Information Processing Systems, pages 751–758, 2004.
- Least-squares policy iteration. Journal of machine learning research, 4(Dec):1107–1149, 2003.
- Robust model predictive control using tubes. Automatica, 40(1):125–133, 2004.
- Steven M LaValle. Planning algorithms. Cambridge university press, 2006.
- Anytime dynamic a*: An anytime, replanning algorithm. In ICAPS, volume 5, pages 262–271, 2005.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- A solution to time-varying markov decision processes. IEEE Robotics and Automation Letters, 3(3):1631–1638, 2018.
- Planning dynamically feasible trajectories for quadrotors using safe flight corridors in 3-d complex environments. IEEE Robotics and Automation Letters, 2(3):1688–1695, 2017.
- Value iteration in continuous actions, states and time. arXiv preprint arXiv:2105.04682, 2021.
- Robust online motion planning with regions of finite time invariance. In Algorithmic foundations of robotics X, pages 543–558. Springer, 2013.
- Funnel libraries for real-time robust feedback motion planning. The International Journal of Robotics Research, 36(8):947–982, 2017.
- Isaac gym: High performance gpu-based physics simulation for robot learning. arXiv preprint arXiv:2108.10470, 2021.
- Sparse gaussian process temporal difference learning for marine robot navigation. arXiv preprint arXiv:1810.01217, 2018.
- Michel Maurette. Mars rover autonomous navigation. Autonomous Robots, 14(2-3):199–208, 2003.
- Mars reconnaissance orbiter’s high resolution imaging science experiment (hirise). Journal of Geophysical Research: Planets, 112(E5), 2007.
- Minimum snap trajectory generation and control for quadrotors. In 2011 IEEE international conference on robotics and automation, pages 2520–2525. IEEE, 2011.
- Learning robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics, 7(62):eabk2822, 2022.
- Variable resolution discretization in optimal control. Machine learning, 49(2-3):291–323, 2002.
- Finite-time bounds for fitted value iteration. Journal of Machine Learning Research, 9(May):815–857, 2008.
- Overcoming exploration in reinforcement learning with demonstrations. In 2018 IEEE international conference on robotics and automation (ICRA), pages 6292–6299. IEEE, 2018a.
- Visual reinforcement learning with imagined goals. Advances in neural information processing systems, 31, 2018b.
- Continuous-time trajectory optimization for online uav replanning. In 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 5332–5339. IEEE, 2016.
- Any-time path-planning: Time-varying wind field+ moving obstacles. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 2575–2582. IEEE, 2016.
- Risk-aware path planning for autonomous underwater vehicles using predictive ocean models. Journal of Field Robotics, 30(5):741–762, 2013.
- Warren B Powell. Perspectives of approximate dynamic programming. Annals of Operations Research, 241(1-2):319–356, 2016.
- Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
- Model predictive control: theory, computation, and design, volume 2. Nob Hill Publishing Madison, WI, 2017.
- Polynomial trajectory planning for aggressive quadrotor flight in dense indoor environments. In Robotics research, pages 649–666. Springer, 2016.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Kernel methods for pattern analysis. Cambridge university press, 2004.
- Handbook of learning and approximate dynamic programming, volume 2. John Wiley & Sons, 2004.
- A comparative study of nonlinear mpc and differential-flatness-based control for quadrotor agile flight. IEEE Transactions on Robotics, 38(6):3357–3373, 2022.
- Reinforcement learning: An introduction. MIT press, 2018.
- Kernelized value function approximation for reinforcement learning. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 1017–1024, 2009.
- Lqr-trees: Feedback motion planning via sums-of-squares verification. The International Journal of Robotics Research, 29(8):1038–1052, 2010.
- A generalized path integral control approach to reinforcement learning. The Journal of Machine Learning Research, 11:3137–3181, 2010.
- Probabilistic robotics, volume 1. MIT press Cambridge, 2000.
- Lqg-mp: Optimized path planning for robots with motion uncertainty and imperfect state information. The International Journal of Robotics Research, 30(7):895–913, 2011.
- Rough terrain navigation using divergence constrained model-based reinforcement learning. In 5th Annual Conference on Robot Learning, 2021.
- Dustin J Webb and Jur van den Berg. Kinodynamic rrt*: Optimal motion planning for systems with linear differential constraints. arXiv preprint arXiv:1205.5088, 2012.
- Information theoretic mpc for model-based reinforcement learning. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 1714–1721. IEEE, 2017.
- Robust sampling based model predictive control with sparse objective information. In Robotics: Science and Systems, 2018.
- Reachable space characterization of markov decision processes with time variability. In Proceedings of Robotics: Science and Systems, FreiburgimBreisgau, Germany, June 2019. doi: 10.15607/RSS.2019.XV.069.
- Kernel Taylor-Based Value Function Approximation for Continuous-State Markov Decision Processes. In Proceedings of Robotics: Science and Systems, Corvalis, Oregon, USA, July 2020. doi: 10.15607/rss.2020.xvi.050.
- Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks, 18(4):973–992, 2007.
- Brian Yamauchi. A frontier-based approach for autonomous exploration. In Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA’97.’Towards New Computational Principles for Robotics and Automation’, pages 146–151. IEEE, 1997.
- Robust and efficient quadrotor trajectory generation for fast autonomous flight. IEEE Robotics and Automation Letters, 4(4):3529–3536, 2019.
- Robust real-time uav replanning using guided gradient-based optimization and topological paths. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 1208–1214. IEEE, 2020.