Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reinforcement Learning in Robotic Motion Planning by Combined Experience-based Planning and Self-Imitation Learning (2306.06754v1)

Published 11 Jun 2023 in cs.RO and cs.AI

Abstract: High-quality and representative data is essential for both Imitation Learning (IL)- and Reinforcement Learning (RL)-based motion planning tasks. For real robots, it is challenging to collect enough qualified data either as demonstrations for IL or experiences for RL due to safety considerations in environments with obstacles. We target this challenge by proposing the self-imitation learning by planning plus (SILP+) algorithm, which efficiently embeds experience-based planning into the learning architecture to mitigate the data-collection problem. The planner generates demonstrations based on successfully visited states from the current RL policy, and the policy improves by learning from these demonstrations. In this way, we relieve the demand for human expert operators to collect demonstrations required by IL and improve the RL performance as well. Various experimental results show that SILP+ achieves better training efficiency higher and more stable success rate in complex motion planning tasks compared to several other methods. Extensive tests on physical robots illustrate the effectiveness of SILP+ in a physical setting.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. doi:10.1109/IROS.2008.4651222.
  2. doi:10.1109/LRA.2019.2899918.
  3. doi:10.1037/0033-295X.108.4.709.
  4. doi:10.1109/ICRA.2019.8794317.
  5. doi:10.1109/ROBOT.2008.4543471.
  6. doi:10.1109/70.508439.
  7. doi:10.15607/RSS.2019.XV.026.
  8. doi:10.1109/ICRA.2019.8793889.
  9. doi:10.1146/annurev-control-100819-063206.
  10. doi:10.1109/MRA.2011.2181749.
  11. doi:10.15607/RSS.2018.XIV.049.
  12. doi:10.1016/j.robot.2011.08.012.
  13. doi:10.1007/s10994-010-5223-6.
  14. doi:10.1109/ICRA48506.2021.9561411.
  15. doi:10.1109/IROS.2018.8594028.
  16. doi:10.1109/ICRA.2018.8460730.
  17. doi:10.1109/TRO.2020.2975428.
  18. doi:10.1109/ICRA48506.2021.9561315.
  19. doi:10.1109/LRA.2020.2970619.
  20. doi:10.1109/LRA.2020.3002217.
  21. doi:10.1109/ICRA.2016.7487167.
  22. doi:10.1109/ICRA40945.2020.9196588.
  23. doi:10.1109/ICRA.2018.8463162.
  24. doi:10.1109/ICHR.2007.4813845.
  25. doi:10.1609/aaai.v34i04.5953.
  26. doi:10.3390/electronics9101742.
  27. doi:10.1109/LRA.2021.3063063.
  28. doi:10.1109/ICRA.2015.7139284.
  29. doi:10.1007/s11370-021-00387-2.
  30. doi:10.1155/2021/8818013.
  31. doi:10.23919/ECC.2018.8550363.
  32. doi:10.1016/j.imavis.2018.05.004.
  33. doi:10.1016/j.patcog.2015.09.023.
  34. doi:10.1109/LRA.2021.3068662.
  35. doi:10.1109/ICRA48506.2021.9561298.
  36. doi:10.1109/ROBOT.1998.677043.
  37. doi:10.1109/ROBOT.2004.1308895.
  38. doi:10.1109/ICRA.2019.8793611.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com