PTDRL: Parameter Tuning using Deep Reinforcement Learning (2306.10833v1)
Abstract: A variety of autonomous navigation algorithms exist that allow robots to move around in a safe and fast manner. However, many of these algorithms require parameter re-tuning when facing new environments. In this paper, we propose PTDRL, a parameter-tuning strategy that adaptively selects from a fixed set of parameters those that maximize the expected reward for a given navigation system. Our learning strategy can be used for different environments, different platforms, and different user preferences. Specifically, we attend to the problem of social navigation in indoor spaces, using a classical motion planning algorithm as our navigation system and training its parameters to optimize its behavior. Experimental results show that PTDRL can outperform other online parameter-tuning strategies.
- E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda, “A survey of autonomous driving: Common practices and emerging technologies,” IEEE Access, vol. 8, pp. 58 443–58 469, 2020.
- D. Fox, W. Burgard, and S. Thrun, “The dynamic window approach to collision avoidance,” IEEE Robotics & Automation Magazine, vol. 4, no. 1, pp. 23–33, 1997.
- S. Quinlan and O. Khatib, “Elastic bands: connecting path planning and control,” in [1993] Proceedings IEEE International Conference on Robotics and Automation, 1993, pp. 802–807 vol.2.
- L. Tai, G. Paolo, and M. Liu, “Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 31–36.
- M. Bojarski, D. D. Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao, and K. Zieba, “End to end learning for self-driving cars,” CoRR, vol. abs/1604.07316, 2016. [Online]. Available: http://arxiv.org/abs/1604.07316
- H.-T. L. Chiang, A. Faust, M. Fiser, and A. Francis, “Learning navigation behaviors end-to-end with autorl,” IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 2007–2014, 2019.
- Y. F. Chen, M. Everett, M. Liu, and J. P. How, “Socially aware motion planning with deep reinforcement learning,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 1343–1350.
- C. Chen, Y. Liu, S. Kreiss, and A. Alahi, “Crowd-robot interaction: Crowd-aware robot navigation with attention-based deep reinforcement learning,” in 2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 6015–6022.
- C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2014. [Online]. Available: http://arxiv.org/abs/1312.6199
- J. Uesato, A. Kumar, C. Szepesvári, T. Erez, A. Ruderman, K. Anderson, K. Dvijotham, N. Heess, and P. Kohli, “Rigorous agent evaluation: An adversarial approach to uncover catastrophic failures,” CoRR, vol. abs/1812.01647, 2018. [Online]. Available: http://arxiv.org/abs/1812.01647
- [Online]. Available: https://www.npr.org/2022/06/15/1105252793/nearly- 400-car-crashes-in-11-months-involved- automated-tech-companies-tell-regul
- [Online]. Available: https://www.nytimes.com/2021/04/18/business/tesla-fatal-crash-texas.html
- X. Xiao, Z. Wang, Z. Xu, B. Liu, G. Warnell, G. Dhamankar, A. Nair, and P. Stone, “Appld: Adaptive planner parameter learning,” Robotics and Autonomous Systems, vol. 154, p. 104132, 2022.
- Z. Xu, G. Dhamankar, A. Nair, X. Xiao, G. Warnell, B. Liu, Z. Wang, and P. Stone, “APPLR: adaptive planner parameter learning from reinforcement,” CoRR, vol. abs/2011.00397, 2020. [Online]. Available: https://arxiv.org/abs/2011.00397
- D. Perille, A. Truong, X. Xiao, and P. Stone, “Benchmarking metric ground navigation,” CoRR, vol. abs/2008.13315, 2020. [Online]. Available: https://arxiv.org/abs/2008.13315
- AWS RoboMaker, “AWS RoboMaker Hospital World ROS package,” GitHub repository, jul 2021. [Online]. Available: https://github.com/aws-robotics/aws-robomaker-hospital-world
- D. Helbing and P. Molnár, “Social force model for pedestrian dynamics,” Physical Review E, vol. 51, pp. 4282–4286, May 1995. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevE.51.4282
- K. Zheng, “ROS navigation tuning guide,” CoRR, vol. abs/1706.09068, 2017. [Online]. Available: http://arxiv.org/abs/1706.09068
- Z. Wang, X. Xiao, B. Liu, G. Warnell, and P. Stone, “APPLI: adaptive planner parameter learning from interventions,” CoRR, vol. abs/2011.00400, 2020. [Online]. Available: https://arxiv.org/abs/2011.00400
- Z. Wang, X. Xiao, G. Warnell, and P. Stone, “Apple: Adaptive planner parameter learning from evaluative feedback,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 7744–7749, 2021.
- S. Niekum, S. Osentoski, C. Atkeson, and A. G. Barto, “Champ: Changepoint detection using approximate model parameters,” Carnegie Mellon University, Pittsburgh, PA, Tech. Rep. CMU-RI-TR-14-10, June 2014.
- D. Ha and J. Schmidhuber, “World models,” CoRR, vol. abs/1803.10122, 2018. [Online]. Available: http://arxiv.org/abs/1803.10122
- H. van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” CoRR, vol. abs/1509.06461, 2015. [Online]. Available: http://arxiv.org/abs/1509.06461
- M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A. Ng, “Ros: an open-source robot operating system,” in ICRA Workshop on Open Source Software, vol. 3, 01 2009.
- I. Zamora, N. G. Lopez, V. M. Vilches, and A. H. Cordero, “Extending the openai gym for robotics: a toolkit for reinforcement learning using ROS and gazebo,” CoRR, vol. abs/1608.05742, 2016. [Online]. Available: http://arxiv.org/abs/1608.05742
- N. Koenig and A. Howard, “Design and use paradigms for gazebo, an open-source multi-robot simulator,” in 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), vol. 3, 2004, pp. 2149–2154 vol.3.