Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
91 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
o3 Pro
5 tokens/sec
GPT-4.1 Pro
15 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
Gemini 2.5 Flash Deprecated
12 tokens/sec
2000 character limit reached

Combining RL and IL using a dynamic, performance-based modulation over learning signals and its application to local planning (2405.09760v1)

Published 16 May 2024 in cs.RO

Abstract: This paper proposes a method to combine reinforcement learning (RL) and imitation learning (IL) using a dynamic, performance-based modulation over learning signals. The proposed method combines RL and behavioral cloning (IL), or corrective feedback in the action space (interactive IL/IIL), by dynamically weighting the losses to be optimized, taking into account the backpropagated gradients used to update the policy and the agent's estimated performance. In this manner, RL and IL/IIL losses are combined by equalizing their impact on the policy's updates, while modulating said impact such that IL signals are prioritized at the beginning of the learning process, and as the agent's performance improves, the RL signals become progressively more relevant, allowing for a smooth transition from pure IL/IIL to pure RL. The proposed method is used to learn local planning policies for mobile robots, synthesizing IL/IIL signals online by means of a scripted policy. An extensive evaluation of the application of the proposed method to this task is performed in simulations, and it is empirically shown that it outperforms pure RL in terms of sample efficiency (achieving the same level of performance in the training environment utilizing approximately 4 times less experiences), while consistently producing local planning policies with better performance metrics (achieving an average success rate of 0.959 in an evaluation environment, outperforming pure RL by 12.5% and pure IL by 13.9%). Furthermore, the obtained local planning policies are successfully deployed in the real world without performing any major fine tuning. The proposed method can extend existing RL algorithms, and is applicable to other problems for which generating IL/IIL signals online is feasible. A video summarizing some of the real world experiments that were conducted can be found in https://youtu.be/mZlaXn9WGzw.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757.
  2. Learning robot motion control with demonstration and advice-operators. In 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 399–404. IEEE.
  3. Reinforcement learning of motor skills using policy search and human corrective advice. The International Journal of Robotics Research, 38(14):1560–1580.
  4. Interactive imitation learning in robotics: A survey. Foundations and Trends® in Robotics, 10(1-2):1–197.
  5. An interactive framework for learning continuous actions policies based on corrective feedback. Journal of Intelligent & Robotic Systems, 95(1):77–97.
  6. A fast hybrid reinforcement learning framework with human corrective feedback. Autonomous Robots, 43:1173–1186.
  7. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In International conference on machine learning, pages 794–803. PMLR.
  8. Coulter, R. C. (1992). Implementation of the pure pursuit path tracking algorithm. Technical report, Carnegie-Mellon UNIV Pittsburgh PA Robotics INST.
  9. Dijkstra, E. W. (2022). A note on two problems in connexion with graphs. In Edsger Wybe Dijkstra: His Life, Work, and Legacy, pages 287–290.
  10. Long-range indoor navigation with prm-rl. IEEE Transactions on Robotics, 36(4):1115–1134.
  11. Intention-net: Integrating planning and deep learning for goal-directed autonomous navigation. In Conference on robot learning, pages 185–194. PMLR.
  12. Generation of fiducial marker dictionaries using mixed integer linear programming. Pattern recognition, 51:481–491.
  13. Efficiently combining human demonstrations and interventions for safe training of autonomous systems in real-time. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 2462–2470.
  14. Integrating behavior cloning and reinforcement learning for improved performance in dense and sparse reward environments. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’20, page 465–473, Richland, SC. International Foundation for Autonomous Agents and Multiagent Systems.
  15. Deep q-learning from demonstrations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.
  16. Generative adversarial imitation learning. Advances in neural information processing systems, 29.
  17. How to train your robot with deep reinforcement learning: lessons we have learned. The International Journal of Robotics Research, 40(4-5):698–721.
  18. Sim2real predictivity: Does evaluation in simulation predict real-world performance? IEEE Robotics and Automation Letters, 5(4):6670–6677.
  19. Design and use paradigms for gazebo, an open-source multi-robot simulator. In 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)(IEEE Cat. No. 04CH37566), volume 3, pages 2149–2154. IEEE.
  20. Robust rl-based map-less local planning: Using 2d point clouds as observations. IEEE Robotics and Automation Letters, 5(4):5787–5794.
  21. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.
  22. Point cloud based reinforcement learning for sim-to-real and partial observability in visual navigation. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5871–5878. IEEE.
  23. Visual navigation for biped humanoid robots using deep reinforcement learning. IEEE Robotics and Automation Letters, 3(4):3247–3254.
  24. An algorithmic perspective on imitation learning. Foundations and Trends® in Robotics, 7(1-2):1–179.
  25. Continuous control for high-dimensional state spaces: An interactive learning approach. In 2019 International Conference on Robotics and Automation (ICRA), pages 7611–7617. IEEE.
  26. Interactive learning with corrective feedback for policies based on deep neural networks. In Proceedings of the 2018 International Symposium on Experimental Robotics, pages 353–363. Springer.
  27. From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots. In 2017 ieee international conference on robotics and automation (icra), pages 1527–1533. IEEE.
  28. Reinforced imitation: Sample efficient deep reinforcement learning for mapless navigation by leveraging prior demonstrations. IEEE Robotics and Automation Letters, 3(4):4423–4430.
  29. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660.
  30. Ros: an open-source robot operating system. In ICRA workshop on open source software, volume 3, page 5. Kobe, Japan.
  31. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. In Proceedings of Robotics: Science and Systems (RSS).
  32. Efficient reductions for imitation learning. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 661–668. JMLR Workshop and Conference Proceedings.
  33. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 627–635. JMLR Workshop and Conference Proceedings.
  34. Learning monocular reactive uav control in cluttered natural environments. In 2013 IEEE international conference on robotics and automation, pages 1765–1772. IEEE.
  35. Kickstarting deep reinforcement learning. arXiv preprint arXiv:1803.03835.
  36. Reinforcement learning: An introduction.
  37. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 31–36. IEEE.
  38. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817.
  39. Cycle-of-learning for autonomous systems from human interaction. arXiv preprint arXiv:1808.09572.
  40. Motion planning and control for mobile robot navigation using machine learning: a survey. Autonomous Robots, 46(5):569–597.
  41. Learning with training wheels: speeding up training with a simple controller for deep reinforcement learning. In 2018 IEEE international conference on robotics and automation (ICRA), pages 6276–6283. IEEE.
  42. Learn to navigate maplessly with varied lidar configurations: A support point-based approach. IEEE Robotics and Automation Letters, 6(2):1918–1925.
  43. Ipaprec: A promising tool for learning high-performance mapless navigation skills with deep reinforcement learning. IEEE/ASME Transactions on Mechatronics, 27(6):5451–5461.
  44. Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In 2020 IEEE symposium series on computational intelligence (SSCI), pages 737–744. IEEE.
  45. Imitation learning: Progress, taxonomies and challenges. IEEE Transactions on Neural Networks and Learning Systems, pages 1–16.
  46. Deep reinforcement learning based mobile robot navigation: A review. Tsinghua Science and Technology, 26(5):674–691.
  47. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In 2017 IEEE international conference on robotics and automation (ICRA), pages 3357–3364. IEEE.
  48. Reinforcement and imitation learning for diverse visuomotor skills. arXiv preprint arXiv:1802.09564.
  49. Reinforced imitation in heterogeneous action space. arXiv preprint arXiv:1904.03438.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com