Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SLIM: Skill Learning with Multiple Critics (2402.00823v2)

Published 1 Feb 2024 in cs.LG, cs.AI, and cs.RO

Abstract: Self-supervised skill learning aims to acquire useful behaviors that leverage the underlying dynamics of the environment. Latent variable models, based on mutual information maximization, have been successful in this task but still struggle in the context of robotic manipulation. As it requires impacting a possibly large set of degrees of freedom composing the environment, mutual information maximization fails alone in producing useful and safe manipulation behaviors. Furthermore, tackling this by augmenting skill discovery rewards with additional rewards through a naive combination might fail to produce desired behaviors. To address this limitation, we introduce SLIM, a multi-critic learning approach for skill discovery with a particular focus on robotic manipulation. Our main insight is that utilizing multiple critics in an actor-critic framework to gracefully combine multiple reward functions leads to a significant improvement in latent-variable skill discovery for robotic manipulation while overcoming possible interference occurring among rewards which hinders convergence to useful skills. Furthermore, in the context of tabletop manipulation, we demonstrate the applicability of our novel skill discovery approach to acquire safe and efficient motor primitives in a hierarchical reinforcement learning fashion and leverage them through planning, significantly surpassing baseline approaches for skill discovery.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. D. Barber and F. V. Agakov, “The im algorithm: a variational approach to information maximization,” in NIPS, 2003.
  2. K. Gregor, D. J. Rezende, and D. Wierstra, “Variational intrinsic control,” International Conference on Learning Representations, 2016.
  3. A. Sharma, S. S. Gu, S. Levine, V. Kumar, and K. Hausman, “Dynamics-aware unsupervised discovery of skills,” ArXiv, vol. abs/1907.01657, 2019.
  4. B. Eysenbach, A. Gupta, J. Ibarz, and S. Levine, “Diversity is all you need: Learning skills without a reward function,” ArXiv, vol. abs/1802.06070, 2018.
  5. S. Kim, J. Kwon, T. Lee, Y. Park, and J. Perez, “Safety-aware unsupervised skill discovery,” in 2023 International Conference on Robotics and Automation (ICRA).   IEEE, 2022.
  6. S. Mysore, G. Cheng, Y. Zhao, K. Saenko, and M. Wu, “Multi-critic actor learning: Teaching rl policies to act with style,” in International Conference on Learning Representations, 2022.
  7. S. Park, J. Choi, J. Kim, H. Lee, and G. Kim, “Lipschitz-constrained unsupervised skill discovery,” International Conference on Learning Representations, 2022.
  8. T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” International Conference on Learning Representations, 2018.
  9. M. L. Puterman, “Markov decision processes: Discrete stochastic dynamic programming,” in Wiley Series in Probability and Statistics, 1994.
  10. O. Khatib, “A unified approach for motion and force control of robot manipulators: The operational space formulation,” IEEE J. Robotics Autom., vol. 3, pp. 43–53, 1987.
  11. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” 2017.
  12. J. Schulman, P. Moritz, S. Levine, M. I. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,” International Conference on Learning Representations, 2016.
  13. A. G. Barto and S. Mahadevan, “Recent advances in hierarchical reinforcement learning,” Discrete Event Dynamic Systems, vol. 13, pp. 41–77, 2003.
  14. V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, and G. State, “Isaac gym: High performance gpu-based physics simulation for robot learning,” ArXiv, vol. abs/2108.10470, 2021.
  15. R. Zhao, Y. Gao, P. Abbeel, V. Tresp, and W. Xu, “Mutual information state intrinsic control,” International Conference on Learning Representations, 2021.
  16. D. Cho, J. Kim, and H. J. Kim, “Unsupervised reinforcement learning for transferable manipulation skill discovery,” IEEE Robotics and Automation Letters, vol. 7, pp. 7455–7462, 2022.
  17. S. Park, K. Lee, Y. Lee, and P. Abbeel, “Controllability-aware unsupervised skill discovery,” International Conference on Machine Learning, 2023.
  18. M. Plappert, M. Andrychowicz, A. Ray, B. McGrew, B. Baker, G. Powell, J. Schneider, J. Tobin, M. Chociej, P. Welinder, V. Kumar, and W. Zaremba, “Multi-goal reinforcement learning: Challenging robotics environments and request for research,” ArXiv, 2018.
  19. L. Lee, B. Eysenbach, E. Parisotto, E. P. Xing, S. Levine, and R. Salakhutdinov, “Efficient exploration via state marginal matching,” ArXiv, 2019.
  20. V. Campos, A. R. Trott, C. Xiong, R. Socher, X. G. i Nieto, and J. Torres, “Explore, discover and learn: Unsupervised discovery of state-covering skills,” in International Conference on Machine Learning, 2020.
  21. Y. Lee, J. Yang, and J. J. Lim, “Learning to coordinate manipulation skills via skill behavior diversification,” in International Conference on Learning Representations, 2020.
  22. H. Liu and P. Abbeel, “Behavior from the void: Unsupervised active pre-training,” Neural Information Processing Systems, 2021.
  23. Y. Zhu, P. Stone, and Y. Zhu, “Bottom-up skill discovery from unsegmented demonstrations for long-horizon robot manipulation,” IEEE Robotics and Automation Letters, vol. 7, pp. 4126–4133, 2021.
  24. M. Laskin, H. Liu, X. B. Peng, D. Yarats, A. Rajeswaran, and P. Abbeel, “Unsupervised Reinforcement Learning with Contrastive Intrinsic Control,” Neural Information Processing Systems, 2022.
  25. M. Laskin, D. Yarats, H. Liu, K. Lee, A. Zhan, K. Lu, C. Cang, L. Pinto, and P. Abbeel, “Urlb: Unsupervised reinforcement learning benchmark,” ArXiv, vol. abs/2110.15191, 2021.
  26. X. B. Peng, M. Chang, G. Zhang, P. Abbeel, and S. Levine, “Mcp: Learning composable hierarchical control with multiplicative compositional policies,” in Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32, 2019.
  27. G. Shi, Q. Li, W. Zhang, J. Chen, and X.-M. Wu, “Recon: Reducing conflicting gradients from the root for multi-task learning,” ArXiv, vol. abs/2302.11289, 2023.
  28. H. Hasselt, “Double q-learning,” in Advances in Neural Information Processing Systems, 2010.
  29. S. Fujimoto, H. van Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” in International Conference on Machine Learning, 2018.
  30. Q. Lan, Y. Pan, A. Fyshe, and M. White, “Maxmin q-learning: Controlling the estimation bias of q-learning,” International Conference on Learning Representations, 2020.
  31. X. Chen, C. Wang, Z. Zhou, and K. W. Ross, “Randomized ensembled double q-learning: Learning fast without a model,” International Conference on Learning Representations, 2021.
  32. K. Lee, M. Laskin, A. Srinivas, and P. Abbeel, “Sunrise: A simple unified framework for ensemble learning in deep reinforcement learning,” International Conference on Machine Learning, 2021.
  33. Y. Wu, S. Zhai, N. Srivastava, J. M. Susskind, J. Zhang, R. Salakhutdinov, and H. Goh, “Uncertainty weighted actor-critic for offline reinforcement learning,” in International Conference on Machine Learning, 2021.
  34. Y. Lee, A. Szot, S.-H. Sun, and J. J. Lim, “Generalizable imitation learning from observation via inferring goal proximity,” in Neural Information Processing Systems, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. David Emukpere (4 papers)
  2. Bingbing Wu (4 papers)
  3. Julien Perez (14 papers)
  4. Jean-Michel Renders (18 papers)