Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory (2402.16349v2)

Published 26 Feb 2024 in cs.LG, cs.SY, and eess.SY

Abstract: Generative Adversarial Imitation Learning (GAIL) trains a generative policy to mimic a demonstrator. It uses on-policy Reinforcement Learning (RL) to optimize a reward signal derived from a GAN-like discriminator. A major drawback of GAIL is its training instability - it inherits the complex training dynamics of GANs, and the distribution shift introduced by RL. This can cause oscillations during training, harming its sample efficiency and final policy performance. Recent work has shown that control theory can help with the convergence of a GAN's training. This paper extends this line of work, conducting a control-theoretic analysis of GAIL and deriving a novel controller that not only pushes GAIL to the desired equilibrium but also achieves asymptotic stability in a 'one-step' setting. Based on this, we propose a practical algorithm 'Controlled-GAIL' (C-GAIL). On MuJoCo tasks, our controlled variant is able to speed up the rate of convergence, reduce the range of oscillation and match the expert's distribution more closely both for vanilla GAIL and GAIL-DAC.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. End-to-end differentiable adversarial imitation learning. In International Conference on Machine Learning, pp.  390–399. PMLR, 2017.
  2. Linear controller design: limits of performance, volume 7. Citeseer, 1991.
  3. Brogan, W. L. Modern control theory. Pearson education india, 1991.
  4. On computation and generalization of generative adversarial imitation learning. arXiv preprint arXiv:2001.02792, 2020.
  5. Exploring the limitations of behavior cloning for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  9329–9338, 2019.
  6. Pot: Python optimal transport. The Journal of Machine Learning Research, 22(1):3571–3578, 2021.
  7. Learning robust rewards with adversarial inverse reinforcement learning. arXiv preprint arXiv:1710.11248, 2017.
  8. imitation: Clean imitation learning implementations. arXiv preprint arXiv:2211.11972, 2022.
  9. Glendinning, P. Stability, instability and chaos: an introduction to the theory of nonlinear differential equations. Cambridge university press, 1994.
  10. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  11. When will generative adversarial imitation learning algorithms attain global convergence. In International Conference on Artificial Intelligence and Statistics, pp.  1117–1125. PMLR, 2021.
  12. Generative adversarial imitation learning. Advances in neural information processing systems, 29, 2016.
  13. Ince, E. L. Ordinary differential equations. Courier Corporation, 1956.
  14. Discriminator-actor-critic: Addressing sample inefficiency and reward bias in adversarial imitation learning. arXiv preprint arXiv:1809.02925, 2018.
  15. La Salle, J. P. The stability of dynamical systems. SIAM, 1976.
  16. Stabilizing gans’ training with brownian motion controller. arXiv preprint arXiv:2306.10468, 2023.
  17. Which training methods for gans do actually converge? In International conference on machine learning, pp.  3481–3490. PMLR, 2018.
  18. Algorithms for inverse reinforcement learning. In Icml, volume 1, pp.  2, 2000.
  19. Imitating human behaviour with diffusion models. arXiv preprint arXiv:2301.10677, 2023.
  20. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp.  627–635. JMLR Workshop and Conference Proceedings, 2011.
  21. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pp.  5026–5033. IEEE, 2012.
  22. Understanding and stabilizing gans’ training dynamics using control theory. In International Conference on Machine Learning, pp.  10566–10575. PMLR, 2020.
  23. Generative adversarial imitation learning with neural network parameterization: Global optimality and convergence rate. In International Conference on Machine Learning, pp.  11044–11054. PMLR, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Tianjiao Luo (2 papers)
  2. Tim Pearce (24 papers)
  3. Huayu Chen (19 papers)
  4. Jianfei Chen (63 papers)
  5. Jun Zhu (424 papers)
Citations (2)