Waypoint-Based Reinforcement Learning for Robot Manipulation Tasks (2403.13281v1)
Abstract: Robot arms should be able to learn new tasks. One framework here is reinforcement learning, where the robot is given a reward function that encodes the task, and the robot autonomously learns actions to maximize its reward. Existing approaches to reinforcement learning often frame this problem as a Markov decision process, and learn a policy (or a hierarchy of policies) to complete the task. These policies reason over hundreds of fine-grained actions that the robot arm needs to take: e.g., moving slightly to the right or rotating the end-effector a few degrees. But the manipulation tasks that we want robots to perform can often be broken down into a small number of high-level motions: e.g., reaching an object or turning a handle. In this paper we therefore propose a waypoint-based approach for model-free reinforcement learning. Instead of learning a low-level policy, the robot now learns a trajectory of waypoints, and then interpolates between those waypoints using existing controllers. Our key novelty is framing this waypoint-based setting as a sequence of multi-armed bandits: each bandit problem corresponds to one waypoint along the robot's motion. We theoretically show that an ideal solution to this reformulation has lower regret bounds than standard frameworks. We also introduce an approximate posterior sampling solution that builds the robot's motion one waypoint at a time. Results across benchmark simulations and two real-world experiments suggest that this proposed approach learns new tasks more quickly than state-of-the-art baselines. See videos here: https://youtu.be/MMEd-lYfq4Y
- S. Belkhale, Y. Cui, and D. Sadigh, “HYDRA: Hybrid robot actions for imitation learning,” in Conference on Robot Learning, 2023.
- B. Akgun, M. Cakmak, K. Jiang, and A. L. Thomaz, “Keyframe-based learning from demonstration: Method and evaluation,” International Journal of Social Robotics, vol. 4, pp. 343–355, 2012.
- L. X. Shi, A. Sharma, T. Z. Zhao, and C. Finn, “Waypoint-based imitation learning for robotic manipulation,” in Conference on Robot Learning, 2023.
- S. Haddadin and E. Croft, “Physical human–robot interaction,” in Springer Handbook of Robotics. Springer, 2016.
- D. Han, B. Mulyana, V. Stankovic, and S. Cheng, “A survey on deep reinforcement learning algorithms for robotic manipulation,” Sensors, vol. 23, no. 7, 2023.
- R. S. Sutton, D. Precup, and S. Singh, “Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning,” Artificial Intelligence, pp. 181–211, 1999.
- S. Nasiriany, H. Liu, and Y. Zhu, “Augmenting reinforcement learning with behavior primitives for diverse manipulation tasks,” in IEEE International Conference on Robotics and Automation, 2022.
- O. Nachum, S. S. Gu, H. Lee, and S. Levine, “Data-efficient hierarchical reinforcement learning,” in Advances in Neural Information Processing Systems, 2018.
- P.-L. Bacon, J. Harb, and D. Precup, “The option-critic architecture,” in AAAI, 2017.
- J. Zhang, H. Yu, and W. Xu, “Hierarchical reinforcement learning by discovering intrinsic options,” in International Conference on Learning Representations, 2021.
- C. R. Garrett, R. Chitnis, R. Holladay, B. Kim, T. Silver, L. P. Kaelbling, and T. Lozano-Pérez, “Integrated task and motion planning,” Annual Review of Control, Robotics, and Autonomous Systems, 2021.
- B. Eysenbach, R. R. Salakhutdinov, and S. Levine, “Search on the replay buffer: Bridging planning and reinforcement learning,” in Advances in Neural Information Processing Systems, 2019.
- R. Gieselmann and F. T. Pokorny, “Planning-augmented hierarchical reinforcement learning,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 5097–5104, 2021.
- F. Xia, C. Li, R. Martín-Martín, O. Litany, A. Toshev, and S. Savarese, “Relmogen: Integrating motion generation in reinforcement learning for mobile manipulation,” in IEEE International Conference on Robotics and Automation, 2021.
- J. Yamada, Y. Lee, G. Salhotra, K. Pertsch, M. Pflueger, G. Sukhatme, J. Lim, and P. Englert, “Motion planner augmented reinforcement learning for robot manipulation in obstructed environments,” in Conference on Robot Learning, 2021.
- D. J. Russo, B. Van Roy, A. Kazerouni, I. Osband, and Z. Wen, “A tutorial on thompson sampling,” Foundations and Trends in Machine Learning, vol. 11, no. 1, pp. 1–96, 2018.
- I. Osband and B. Van Roy, “Why is posterior sampling better than optimism for reinforcement learning?” in International Conference on Machine Learning, 2017.
- C. Qin, Z. Wen, X. Lu, and B. Van Roy, “An analysis of ensemble sampling,” Advances in Neural Information Processing Systems, 2022.
- K. Lee, M. Laskin, A. Srinivas, and P. Abbeel, “Sunrise: A simple unified framework for ensemble learning in deep reinforcement learning,” in International Conference on Machine Learning, 2021.
- I. Osband and B. Van Roy, “On lower bounds for regret in reinforcement learning,” arXiv preprint arXiv:1608.02732, 2016.
- F. Memarian, W. Goo, R. Lioutikov, S. Niekum, and U. Topcu, “Self-supervised online reward shaping in sparse-reward environments,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2021.
- S. Bubeck and N. Cesa-Bianchi, “Regret analysis of stochastic and nonstochastic multi-armed bandit problems,” Foundations and Trends in Machine Learning, vol. 5, no. 1, pp. 1–122, 2012.
- O. D. Domingues, P. Ménard, E. Kaufmann, and M. Valko, “Episodic reinforcement learning in finite mdps: Minimax lower bounds revisited,” in Algorithmic Learning Theory, 2021.
- Y. Zhu, J. Wong, A. Mandlekar, R. Martín-Martín, A. Joshi, S. Nasiriany, and Y. Zhu, “robosuite: A modular simulation framework and benchmark for robot learning,” in arXiv:2009.12293, 2020.
- T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International Conference on Machine Learning, 2018.
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv:1707.06347, 2017.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.