Diffusion-Reward Adversarial Imitation Learning (2405.16194v4)
Abstract: Imitation learning aims to learn a policy from observing expert demonstrations without access to reward signals from environments. Generative adversarial imitation learning (GAIL) formulates imitation learning as adversarial learning, employing a generator policy learning to imitate expert behaviors and discriminator learning to distinguish the expert demonstrations from agent trajectories. Despite its encouraging results, GAIL training is often brittle and unstable. Inspired by the recent dominance of diffusion models in generative modeling, we propose Diffusion-Reward Adversarial Imitation Learning (DRAIL), which integrates a diffusion model into GAIL, aiming to yield more robust and smoother rewards for policy learning. Specifically, we propose a diffusion discriminative classifier to construct an enhanced discriminator, and design diffusion rewards based on the classifier's output for policy learning. Extensive experiments are conducted in navigation, manipulation, and locomotion, verifying DRAIL's effectiveness compared to prior imitation learning methods. Moreover, additional experimental results demonstrate the generalizability and data efficiency of DRAIL. Visualized learned reward functions of GAIL and DRAIL suggest that DRAIL can produce more robust and smoother rewards. Project page: https://nturobotlearninglab.github.io/DRAIL/
- Apprenticeship learning via inverse reinforcement learning. In International Conference on Machine Learning, 2004.
- Wasserstein generative adversarial networks. In International Conference on Machine Learning, 2017.
- Playing hard exploration games by watching youtube. In Neural Information Processing Systems, 2018.
- Variational inference: A review for statisticians. Journal of the American statistical Association, 2017.
- Diffusion model-augmented behavioral cloning. In International Conference on Machine Learning, 2024.
- Diffusion policy: Visuomotor policy learning via action diffusion. In Robotics: Science and Systems, 2023.
- Deep reinforcement learning from human preferences. In Neural Information Processing Systems, 2017.
- Primal wasserstein imitation learning. arXiv preprint arXiv:2006.04678, 2020.
- Diffusion models beat gans on image synthesis. In Neural Information Processing Systems, 2021.
- Implicit behavioral cloning. In Conference on Robotic Learning, 2022.
- Learning robust rewards with adverserial inverse reinforcement learning. In International Conference on Learning Representations, 2018.
- D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
- A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 2015.
- Generative adversarial nets. In Neural Information Processing Systems, 2014.
- A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330, 2022.
- Imitation learning with concurrent actions in 3d games. In IEEE Conference on Computational Intelligence and Games, 2018.
- Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, 2016.
- Deep reinforcement learning that matters. In AAAI Conference on Artificial Intelligence, 2018.
- Generative adversarial imitation learning. In Neural Information Processing Systems, 2016.
- Denoising diffusion probabilistic models. In Neural Information Processing Systems, 2020.
- Imitation learning: A survey of learning methods. ACM Computing Surveys, 2017.
- Augmenting gail with bc for sample efficient imitation learning. In Conference on Robot Learning, 2021.
- Variational diffusion models. In Neural Information Processing Systems, 2021.
- Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
- Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 2021.
- Learning to act from actionless videos through dense correspondences. In International Conference on Learning Representations, 2024.
- Discriminator-actor-critic: Addressing sample inefficiency and reward bias in adversarial imitation learning. In International Conference on Machine Learning, 2019a.
- Discriminator-actor-critic: Addressing sample inefficiency and reward bias in adversarial imitation learning. In International Conference on Learning Representations, 2019b.
- Composing complex skills by learning transition policies. In International Conference on Learning Representations, 2019.
- Generalizable imitation learning from observation via inferring goal proximity. In Neural Information Processing Systems, 2021.
- Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871, 2018.
- Your diffusion model is secretly a zero-shot classifier. In IEEE International Conference on Computer Vision, 2023.
- Learning to drive by imitation: An overview of deep behavior cloning methods. IEEE Transactions on Intelligent Vehicles, 2020.
- Algorithms for inverse reinforcement learning. In International Conference on Machine Learning, 2000.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, 2021.
- What matters for adversarial imitation learning? In Neural Information Processing Systems, 2021.
- An algorithmic perspective on imitation learning. Foundations and Trends® in Robotics, 2018.
- Imitating human behaviour with diffusion models. In International Conference on Learning Representations, 2023.
- Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464, 2018.
- Dean A Pomerleau. Alvinn: An autonomous land vehicle in a neural network. In Neural Information Processing Systems, 1989.
- Recent advances in robot learning from demonstration. Annual review of control, robotics, and autonomous systems, 2020.
- Goal-conditioned imitation learning using score-based diffusion policies. arXiv preprint arXiv:2304.02532, 2023.
- U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, 2015.
- A reduction of imitation learning and structured prediction to no-regret online learning. In International Conference on Artificial Intelligence and Statistics, 2011.
- Stefan Schaal. Learning from demonstration. In Neural Information Processing Systems, 1997.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, 2015.
- Denoising diffusion implicit models. In International Conference on Machine Learning, 2021.
- Inverse reinforcement learning without reinforcement learning. In International Conference on Machine Learning, 2023.
- Apprenticeship learning using linear programming. In Proceedings of the 25th international conference on Machine learning, 2008.
- Behavioral cloning from observation. In International Joint Conference on Artificial Intelligence, 2018.
- Generative adversarial imitation from observation. In International Conference on Machine Learning, 2019.
- Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers, 2022.
- Diffail: Diffusion adversarial imitation learning. arXiv preprint arXiv:2312.06348, 2023.
- Diffail: Diffusion adversarial imitation learning. In AAAI Conference on Artificial Intelligence, 2024.
- Learning fine-grained bimanual manipulation with low-cost hardware. In Robotics: Science and Systems, 2023.
- Maximum entropy inverse reinforcement learning. In AAAI Conference on Artificial Intelligence, 2008.
- Task-relevant adversarial imitation learning. In Conference on Robotic Learning, 2021.
- Chun-Mao Lai (5 papers)
- Hsiang-Chun Wang (2 papers)
- Ping-Chun Hsieh (31 papers)
- Yu-Chiang Frank Wang (88 papers)
- Min-Hung Chen (41 papers)
- Shao-Hua Sun (22 papers)