Reinforcement Learning for Robot Navigation with Adaptive Forward Simulation Time (AFST) in a Semi-Markov Model (2108.06161v4)
Abstract: Deep reinforcement learning (DRL) algorithms have proven effective in robot navigation, especially in unknown environments, by directly mapping perception inputs into robot control commands. However, most existing methods ignore the local minimum problem in navigation and thereby cannot handle complex unknown environments. In this paper, we propose the first DRL-based navigation method modeled by a semi-Markov decision process (SMDP) with continuous action space, named Adaptive Forward Simulation Time (AFST), to overcome this problem. Specifically, we reduce the dimensions of the action space and improve the distributed proximal policy optimization (DPPO) algorithm for the specified SMDP problem by modifying its GAE to better estimate the policy gradient in SMDPs. Experiments in various unknown environments demonstrate the effectiveness of AFST.
- F. Niroui, K. Zhang, Z. Kashino, and G. Nejat, “Deep reinforcement learning robot for search and rescue applications: Exploration in unknown cluttered environments,” IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 610–617, 2019.
- E. Marchesini, D. Corsi, and A. Farinelli, “Benchmarking safe deep reinforcement learning in aquatic navigation,” in Proceedings of the 34th International Conference on Intelligent Robots and Systems (IROS-2021). IEEE, 2021, pp. 5590–5595.
- B. Bonet and H. Geffner, “Planning as heuristic search,” Artificial Intelligence, vol. 129, no. 1-2, pp. 5–33, 2001.
- J. Nasir, F. Islam, U. Malik, Y. Ayaz, O. Hasan, M. Khan, and M. S. Muhammad, “Rrt*-smart: A rapid convergence implementation of rrt,” International Journal of Advanced Robotic Systems, vol. 10, no. 7, p. 299, 2013.
- L. Tai, G. Paolo, and M. Liu, “Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation,” in Proceedings of the 30th International Conference on Intelligent Robots and Systems (IROS-2017). IEEE, 2017, pp. 31–36.
- M. Pfeiffer, S. Shukla, M. Turchetta, C. Cadena, A. Krause, R. Siegwart, and J. Nieto, “Reinforced imitation: Sample efficient deep reinforcement learning for mapless navigation by leveraging prior demonstrations,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 4423–4430, 2018.
- E. Marchesini and A. Farinelli, “Discrete deep reinforcement learning for mapless navigation,” in Proceedings of the 37th International Conference on Robotics and Automation (ICRA-2020). IEEE, 2020, pp. 10 688–10 694.
- M. Wang and J. N. Liu, “Fuzzy logic-based real-time robot navigation in unknown environment with dead ends,” Robotics and autonomous systems, vol. 56, no. 7, pp. 625–643, 2008.
- D. Fox, W. Burgard, and S. Thrun, “The dynamic window approach to collision avoidance,” IEEE Robotics & Automation Magazine, vol. 4, no. 1, pp. 23–33, 1997.
- M. Keller, F. Hoffmann, C. Hass, T. Bertram, and A. Seewald, “Planning of optimal collision avoidance trajectories with timed elastic bands,” IFAC Proceedings Volumes, vol. 47, no. 3, pp. 9822–9827, 2014.
- S. Sharma, A. Srinivas, and B. Ravindran, “Learning to repeat: Fine grained action repetition for deep reinforcement learning,” 2017.
- I. P. Durugkar, C. Rosenbaum, S. Dernbach, and S. Mahadevan, “Deep reinforcement learning with macro-actions,” arXiv preprint arXiv:1606.04615, 2016.
- A. Lakshminarayanan, S. Sharma, and B. Ravindran, “Dynamic action repetition for deep reinforcement learning,” in Proceedings of the 31th AAAI Conference on Artificial Intelligence (AAAI-2017), vol. 31, no. 1, 2017.
- R. S. Sutton, D. Precup, and S. Singh, “Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning,” Artificial intelligence, vol. 112, no. 1-2, pp. 181–211, 1999.
- J.-L. Blanco, J. Gonzalez, and J.-A. Fernández-Madrigal, “The trajectory parameter space (tp-space): a new space representation for non-holonomic mobile robot reactive navigation,” in 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2006, pp. 1195–1200.
- N. Heess, D. TB, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Wang, S. M. A. Eslami, M. A. Riedmiller, and D. Silver, “Emergence of locomotion behaviours in rich environments,” arXiv preprint arXiv:1707.02286, 2017.
- J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,” in Proceedings of the 4th International Conference on Learning Representations (ICLR-2016), 2016.
- S. Mahadevan and N. Khaleeli, “Robust mobile robot navigation using partially-observable semi-markov decision processes,” Internal report, 1999.
- O. Saha and P. Dasgupta, “Real-time robot path planning around complex obstacle patterns through learning and transferring options,” in Proceedings of the 17th International Conference on Autonomous Robot Systems and Competitions (ICARSC-2017). IEEE, 2017, pp. 278–283.
- Y. F. Chen, M. Liu, M. Everett, and J. P. How, “Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning,” in Proceedings of the 34th International Conference on Robotics and Automation (ICRA-2017). IEEE, 2017, pp. 285–292.
- M. Everett, Y. F. Chen, and J. P. How, “Motion planning among dynamic, decision-making agents with deep reinforcement learning,” in Proceedings of the 31th International Conference on Intelligent Robots and Systems (IROS-2018). IEEE, 2018, pp. 3052–3059.
- G. Chen, S. Yao, J. Ma, L. Pan, Y. Chen, P. Xu, J. Ji, and X. Chen, “Distributed non-communicating multi-robot collision avoidance via map-based deep reinforcement learning,” Sensors, vol. 20, no. 17, p. 4836, 2020.
- Y. Chen, G. Chen, L. Pan, J. Ma, Y. Zhang, Y. Zhang, and J. Ji, “Drqn-based 3d obstacle avoidance with a limited field of view,” in Proceedings of the 34th International Conference on Intelligent Robots and Systems (IROS-2021). IEEE, 2021, pp. 8137–8143.
- B. Brito, M. Everett, J. P. How, and J. Alonso-Mora, “Where to go next: learning a subgoal recommendation policy for navigation in dynamic environments,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4616–4623, 2021.
- E. Chane-Sane, C. Schmid, and I. Laptev, “Goal-conditioned reinforcement learning with imagined subgoals,” in Proceedings of the 38th International Conference on Machine Learning (ICML-2021). PMLR, 2021, pp. 1430–1440.
- J.-L. Blanco, J. González, and J.-A. Fernández-Madrigal, “Extending obstacle avoidance methods through multiple parameter-space transformations,” Autonomous Robots, vol. 24, no. 1, pp. 29–48, 2008.
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
- D. Silver, “Lecture 7: Policy gradient,” UCL Course on RL, 2015.
- D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (ELUs),” 2016.
- S. Schmoll and M. Schubert, “Semi-markov reinforcement learning for stochastic resource collection,” in Proceedings of the 30th International Conference on International Joint Conferences on Artificial Intelligence (IJCAI-2021), 2021, pp. 3349–3355.
- V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th International Conference on Machine Learning (ICML-2010).
- Yu'an Chen (3 papers)
- Ruosong Ye (4 papers)
- Ziyang Tao (1 paper)
- Hongjian Liu (3 papers)
- Guangda Chen (7 papers)
- Jie Peng (100 papers)
- Jun Ma (347 papers)
- Yu Zhang (1400 papers)
- Jianmin Ji (55 papers)
- Yanyong Zhang (63 papers)