Enhancing Task Performance of Learned Simplified Models via Reinforcement Learning (2310.09714v2)
Abstract: In contact-rich tasks, the hybrid, multi-modal nature of contact dynamics poses great challenges in model representation, planning, and control. Recent efforts have attempted to address these challenges via data-driven methods, learning dynamical models in combination with model predictive control. Those methods, while effective, rely solely on minimizing forward prediction errors to hope for better task performance with MPC controllers. This weak correlation can result in data inefficiency as well as limitations to overall performance. In response, we propose a novel strategy: using a policy gradient algorithm to find a simplified dynamics model that explicitly maximizes task performance. Specifically, we parameterize the stochastic policy as the perturbed output of the MPC controller, thus, the learned model representation can directly associate with the policy or task performance. We apply the proposed method to contact-rich tasks where a three-fingered robotic hand manipulates previously unknown objects. Our method significantly enhances task success rate by up to 15% in manipulating diverse objects compared to the existing method while sustaining data efficiency. Our method can solve some tasks with success rates of 70% or higher using under 30 minutes of data. All videos and codes are available at https://sites.google.com/view/lcs-rl.
- T. Marcucci and R. Tedrake, “Warm Start of Mixed-Integer Programs for Model Predictive Control of Hybrid Systems,” IEEE Transactions on Automatic Control, vol. 66, pp. 2433–2448, 2019.
- D. Frick, A. Georghiou, J. L. Jerez, A. Domahidi, and M. Morari, “Low-complexity method for hybrid MPC with local guarantees,” SIAM Journal on Control and Optimization, vol. 57, no. 4, pp. 2328–2361, 2019.
- A. Aydinoglu, A. Wei, and M. Posa, “Consensus Complementarity Control for Multi-Contact MPC,” arXiv preprint arXiv:2304.11259, 2023.
- A. Aydinoglu and M. Posa, “Real-Time Multi-Contact Model Predictive Control via ADMM,” in 2022 International Conference on Robotics and Automation (ICRA). Philadelphia, PA, USA: IEEE, 2022, pp. 3414–3421.
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algorithms,” arXiv preprint arXiv:1707.06347, 2017.
- S. Fujimoto, H. Hoof, and D. Meger, “Addressing Function Approximation Error in Actor-Critic Methods,” in Proceedings of the 35th International Conference on Machine Learning. PMLR, 2018, pp. 1587–1596.
- T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,” in Proceedings of the 35th International Conference on Machine Learning, vol. 80. PMLR, 2018, pp. 1861–1870.
- A. Nagabandi, G. Kahn, R. S. Fearing, and S. Levine, “Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 7559–7566.
- K. Chua, R. Calandra, R. McAllister, and S. Levine, “Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models,” in Advances in Neural Information Processing Systems, vol. 31, 2018.
- A. Nagabandi, K. Konoglie, S. Levine, and V. Kumar, “Deep Dynamics Models for Learning Dexterous Manipulation,” in Conference on Robot Learning. PMLR, 2020, pp. 1101–1112.
- A. S. Morgan, D. Nandha, G. Chalvatzaki, C. D’Eramo, A. M. Dollar, and J. Peters, “Model Predictive Actor-Critic: Accelerating Robot Skill Acquisition with Deep Reinforcement Learning,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 6672–6678.
- F. Ebert, C. Finn, S. Dasari, A. Xie, A. Lee, and S. Levine, “Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control,” arXiv:1812.00568 [cs], 2018.
- M. Zhang, S. Vikram, L. Smith, P. Abbeel, M. J. Johnson, and S. Levine, “SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning,” in Proceedings of the 36th International Conference on Machine Learning, vol. 97. PMLR, 2019, pp. 7444–7453.
- R. Ghugare, H. Bharadhwaj, B. Eysenbach, S. Levine, and R. Salakhutdinov, “Simplifying model-based RL: Learning representations, latent-space models, and policies with one objective,” in The Eleventh International Conference on Learning Representations, 2023.
- M. Parmar, M. Halm, and M. Posa, “Fundamental Challenges in Deep Learning for Stiff Contact Dynamics,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Prague, Czech Republic: IEEE, 2021, pp. 5181–5188.
- B. Bianchini, M. Halm, N. Matni, and M. Posa, “Generalization Bounded Implicit Learning of Nearly Discontinuous Functions,” in Proceedings of The 4th Annual Learning for Dynamics and Control Conference (L4DC), ser. Proceedings of Machine Learning Research, vol. 168, 2022, pp. 1112–1124.
- V. Kumar, E. Todorov, and S. Levine, “Optimal control with learned local models: Application to dexterous manipulation,” in 2016 IEEE International Conference on Robotics and Automation (ICRA). Stockholm, Sweden: IEEE, 2016, pp. 378–383.
- S. Levine and V. Koltun, “Guided Policy Search,” in Proceedings of the 30th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, S. Dasgupta and D. McAllester, Eds., vol. 28. Atlanta, Georgia, USA: PMLR, 2013, pp. 1–9.
- M. P. Deisenroth and C. E. Rasmussen, “PILCO: A Model-Based and Data-Efficient Approach to Policy Search,” in Proceedings of the 28th International Conference on International Conference on Machine Learning, ser. ICML’11. Madison, WI, USA: Omnipress, 2011, pp. 465–472.
- W. Jin and M. Posa, “Task-driven hybrid model reduction for dexterous manipulation,” IEEE Transactions on Robotics (TRO), vol. 40, pp. 1774–1794, Jan. 2024.
- N. Lambert, B. Amos, O. Yadan, and R. Calandra, “Objective Mismatch in Model-based Reinforcement Learning,” arXiv preprint arXiv:2002.04523, 2021.
- M. Okada, L. Rigazio, and T. Aoshima, “Path Integral Networks: End-to-End Differentiable Optimal Control,” arXiv preprint arXiv:1706.09597, 2017.
- B. Amos, I. D. J. Rodriguez, J. Sacks, B. Boots, and J. Z. Kolter, “Differentiable MPC for End-to-end Planning and Control,” in Advances in Neural Information Processing Systems, 2019.
- H. N. Esfahani, A. B. Kordabad, and S. Gros, “Approximate Robust NMPC using Reinforcement Learning,” in 2021 European Control Conference (ECC), Rotterdam, Netherlands, 2021.
- S. Saxena, A. LaGrassa, and O. Kroemer, “Learning reactive and predictive differentiable controllers for switching linear dynamical models,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 7563–7569.
- W. Jin, Z. Wang, Z. Yang, and S. Mou, “Pontryagin differentiable programming: An end-to-end learning and control framework,” in Advances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 7979–7992.
- W. Jin, S. Mou, and G. J. Pappas, “Safe pontryagin differentiable programming,” in Advances in Neural Information Processing Systems, vol. 34. Curran Associates, Inc., 2021, pp. 16 034–16 050.
- W. Wan, Y. Wang, Z. Erickson, and D. Held, “Difftop: Differentiable trajectory optimization for deep reinforcement and imitation learning,” arXiv preprint arXiv:2402.05421, 2024.
- L. Pineda, T. Fan, M. Monge, S. Venkataraman, P. Sodhi, R. T. Chen, J. Ortiz, D. DeTone, A. Wang, S. Anderson, J. Dong, B. Amos, and M. Mukadam, “Theseus: A Library for Differentiable Nonlinear Optimization,” Advances in Neural Information Processing Systems, 2022.
- M. Xu, T. L. Molloy, and S. Gould, “Revisiting implicit differentiation for learning problems in optimal control,” in Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- D. Stewart and J. Trinkle, “An implicit time-stepping scheme for rigid body dynamics with coulomb friction,” in Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065), vol. 1, 2000, pp. 162–169 vol.1.
- A. Aydinoglu, P. Sieg, V. M. Preciado, and M. Posa, “Stabilization of Complementarity Systems via Contact-Aware Controllers,” IEEE Transactions on Robotics, vol. 38, no. 3, pp. 1735–1754.
- W. Jin, A. Aydinoglu, M. Halm, and M. Posa, “Learning Linear Complementarity Systems,” in Proceedings of The 4th Annual Learning for Dynamics and Control Conference (L4DC). PMLR, 2022, p. 13.
- S. N. Afriat, “Theory of maxima and the method of lagrange,” SIAM Journal on Applied Mathematics, vol. 20, no. 3, pp. 343–357, 1971.
- M. Posa, C. Cantu, and R. Tedrake, “A direct method for trajectory optimization of rigid bodies through contact,” The International Journal of Robotics Research, vol. 33, no. 1, pp. 69–81, 2014.
- A. Wächter and L. T. Biegler, “On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming,” Mathematical Programming, vol. 106, no. 1, pp. 25–57, 2006.
- J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-Dimensional Continuous Control Using Generalized Advantage Estimation,” in 4th International Conference on Learning Representations (ICLR), 2016.
- E. Todorov, T. Erez, and Y. Tassa, “MuJoCo: A physics engine for model-based control,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 5026–5033.
- H. W. Kuhn and A. W. Tucker, “Nonlinear programming,” in Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, 1950. Berkeley and Los Angeles: University of California Press, 1951, pp. 481–492.
- C. Büskens and H. Maurer, “Sensitivity analysis and real-time control of nonlinear optimal control systems via nonlinear programming methods,” in Variational Calculus, Optimal Control and Applications: International Conference in Honour of L. Bittner and R. Klötzler, Basel: Birkhäuser Basel, 1998.
- O. Khatib, “A unified approach for motion and force control of robot manipulators: The operational space formulation,” IEEE J. Robotics Autom., vol. 3, pp. 43–53, 1987.
- A. Allshire, M. MittaI, V. Lodaya, V. Makoviychuk, D. Makoviichuk, F. Widmaier, M. Wüthrich, S. Bauer, A. Handa, and A. Garg, “Transferring dexterous manipulation from gpu simulation to a remote real-world trifinger,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 11 802–11 809.
- N. Funk, C. Schaff, R. Madan, T. Yoneda, J. U. De Jesus, J. Watson, E. K. Gordon, F. Widmaier, S. Bauer, S. S. Srinivasa, T. Bhattacharjee, M. R. Walter, and J. Peters, “Benchmarking Structured Policies and Policy Optimization for Real-World Dexterous Object Manipulation,” IEEE Robotics and Automation Letters, vol. 7, no. 1, pp. 478–485, 2022.
- B. Calli, A. Singh, A. Walsman, S. Srinivasa, P. Abbeel, and A. M. Dollar, “The YCB object and Model set: Towards common benchmarks for manipulation research,” in 2015 International Conference on Advanced Robotics (ICAR). Istanbul, Turkey: IEEE, 2015, pp. 510–517.
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2019.