Papers
Topics
Authors
Recent
Search
2000 character limit reached

L2C2: Locally Lipschitz Continuous Constraint towards Stable and Smooth Reinforcement Learning

Published 15 Feb 2022 in cs.RO and cs.LG | (2202.07152v1)

Abstract: This paper proposes a new regularization technique for reinforcement learning (RL) towards making policy and value functions smooth and stable. RL is known for the instability of the learning process and the sensitivity of the acquired policy to noise. Several methods have been proposed to resolve these problems, and in summary, the smoothness of policy and value functions learned mainly in RL contributes to these problems. However, if these functions are extremely smooth, their expressiveness would be lost, resulting in not obtaining the global optimal solution. This paper therefore considers RL under local Lipschitz continuity constraint, so-called L2C2. By designing the spatio-temporal locally compact space for L2C2 from the state transition at each time step, the moderate smoothness can be achieved without loss of expressiveness. Numerical noisy simulations verified that the proposed L2C2 outperforms the task performance while smoothing out the robot action generated from the learned policy.

Authors (1)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. H. Modares, I. Ranatunga, F. L. Lewis, and D. O. Popa, “Optimized assistive human–robot interaction using reinforcement learning,” IEEE transactions on cybernetics, vol. 46, no. 3, pp. 655–667, 2015.
  2. T. Kobayashi, E. Dean-Leon, J. R. Guadarrama-Olvera, F. Bergner, and G. Cheng, “Whole-body multicontact haptic human–humanoid interaction based on leader–follower switching: A robot dance of the “box step”,” Advanced Intelligent Systems, p. 2100038, 2021.
  3. T. Kobayashi, T. Aoyama, K. Sekiyama, and T. Fukuda, “Selection algorithm for locomotion based on the evaluation of falling risk,” IEEE Transactions on Robotics, vol. 31, no. 3, pp. 750–765, 2015.
  4. J. Delmerico, S. Mintchev, A. Giusti, B. Gromov, K. Melo, T. Horvat, C. Cadena, M. Hutter, A. Ijspeert, D. Floreano et al., “The current state and future outlook of rescue robotics,” Journal of Field Robotics, vol. 36, no. 7, pp. 1171–1191, 2019.
  5. Y. Tsurumine, Y. Cui, E. Uchibe, and T. Matsubara, “Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation,” Robotics and Autonomous Systems, vol. 112, pp. 72–83, 2019.
  6. O. Kroemer, S. Niekum, and G. Konidaris, “A review of robot learning for manipulation: Challenges, representations, and algorithms,” Journal of Machine Learning Research, vol. 22, no. 30, pp. 1–82, 2021.
  7. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015.
  8. N. Rudin, D. Hoeller, P. Reist, and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” arXiv preprint arXiv:2109.11978, 2021.
  9. K. Cobbe, O. Klimov, C. Hesse, T. Kim, and J. Schulman, “Quantifying generalization in reinforcement learning,” in International Conference on Machine Learning.   PMLR, 2019, pp. 1282–1289.
  10. J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Constrained policy optimization,” in International Conference on Machine Learning.   PMLR, 2017, pp. 22–31.
  11. B. Thananjeyan, A. Balakrishna, S. Nair, M. Luo, K. Srinivasan, M. Hwang, J. E. Gonzalez, J. Ibarz, C. Finn, and K. Goldberg, “Recovery rl: Safe reinforcement learning with learned recovery zones,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4915–4922, 2021.
  12. S. Mysore, B. Mabsout, K. Saenko, and R. Mancuso, “How to train your quadrotor: A framework for consistently smooth and responsive flight control via reinforcement learning,” ACM Transactions on Cyber-Physical Systems, vol. 5, no. 4, pp. 1–24, 2021.
  13. P. Thodoroff, A. Durand, J. Pineau, and D. Precup, “Temporal regularization for markov decision process,” in Advances in Neural Information Processing Systems, 2018, pp. 1779–1789.
  14. S. Mysore, B. Mabsout, R. Mancuso, and K. Saenko, “Regularizing action policies for smooth control with reinforcement learning,” in IEEE International Conference on Robotics and Automation.   IEEE, 2021, pp. 1810–1816.
  15. T. Kobayashi and W. E. L. Ilboudo, “t-soft update of target network for deep reinforcement learning,” Neural Networks, 2021.
  16. T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” arXiv preprint arXiv:1511.05952, 2015.
  17. D. Pfau and O. Vinyals, “Connecting generative adversarial networks and actor-critic methods,” arXiv preprint arXiv:1610.01945, 2016.
  18. T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” in International Conference on Learning Representations, 2018.
  19. K. Scaman and A. Virmaux, “Lipschitz regularity of deep neural networks: analysis and efficient estimation,” in International Conference on Neural Information Processing Systems, 2018, pp. 3839–3848.
  20. H. Gouk, E. Frank, B. Pfahringer, and M. J. Cree, “Regularisation of neural networks by enforcing lipschitz continuity,” Machine Learning, vol. 110, no. 2, pp. 393–416, 2021.
  21. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  22. I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner, “beta-vae: Learning basic visual concepts with a constrained variational framework,” in International Conference on Learning Representations, 2017.
  23. I. Osband, C. Blundell, A. Pritzel, and B. Van Roy, “Deep exploration via bootstrapped dqn,” Advances in neural information processing systems, vol. 29, pp. 4026–4034, 2016.
  24. S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” in International Conference on Machine Learning.   PMLR, 2018, pp. 1587–1596.
  25. H. Zhang, H. Chen, C. Xiao, B. Li, M. Liu, D. Boning, and C.-J. Hsieh, “Robust deep reinforcement learning against adversarial perturbations on state observations,” Advances in Neural Information Processing Systems, vol. 33, pp. 21 024–21 037, 2020.
  26. H. Zhang, H. Chen, D. S. Boning, and C.-J. Hsieh, “Robust reinforcement learning on state observations with learned optimal adversary,” in International Conference on Learning Representations, 2021.
  27. W. E. L. Ilboudo, T. Kobayashi, and K. Sugimoto, “Adaterm: Adaptive t-distribution estimated robust moments towards noise-robust stochastic gradient optimizer,” arXiv preprint arXiv:2201.06714, 2022.
  28. T. Kobayashi, “Proximal policy optimization with relative pearson divergence,” in IEEE international conference on robotics and automation.   IEEE, 2021, pp. 8416–8421.
  29. E. Coumans and Y. Bai, “Pybullet, a python module for physics simulation for games, robotics and machine learning,” GitHub repository, 2016.
  30. G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” arXiv preprint arXiv:1606.01540, 2016.
  31. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in Advances in Neural Information Processing Systems Workshop, 2017.
  32. J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv preprint arXiv:1607.06450, 2016.
  33. J. T. Barron, “Squareplus: A softplus-like algebraic rectifier,” arXiv preprint arXiv:2112.11687, 2021.
  34. R. E. Steuer and E.-U. Choo, “An interactive weighted tchebycheff procedure for multiple objective programming,” Mathematical programming, vol. 26, no. 3, pp. 326–344, 1983.
  35. T. Aotani, T. Kobayashi, and K. Sugimoto, “Meta-optimization of bias-variance trade-off in stochastic model learning,” IEEE Access, vol. 9, pp. 148 783–148 799, 2021.
Citations (13)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.