DiffAIL: Diffusion Adversarial Imitation Learning (2312.06348v2)
Abstract: Imitation learning aims to solve the problem of defining reward functions in real-world decision-making tasks. The current popular approach is the Adversarial Imitation Learning (AIL) framework, which matches expert state-action occupancy measures to obtain a surrogate reward for forward reinforcement learning. However, the traditional discriminator is a simple binary classifier and doesn't learn an accurate distribution, which may result in failing to identify expert-level state-action pairs induced by the policy interacting with the environment. To address this issue, we propose a method named diffusion adversarial imitation learning (DiffAIL), which introduces the diffusion model into the AIL framework. Specifically, DiffAIL models the state-action pairs as unconditional diffusion models and uses diffusion loss as part of the discriminator's learning objective, which enables the discriminator to capture better expert demonstrations and improve generalization. Experimentally, the results show that our method achieves state-of-the-art performance and significantly surpasses expert demonstration on two benchmark tasks, including the standard state-action setting and state-only settings. Our code can be available at the link https://github.com/ML-Group-SDU/DiffAIL.
- Is conditional generative modeling all you need for decision-making? arXiv preprint arXiv:2211.15657.
- Analytic-dpm: an analytic estimate of the optimal reverse variance in diffusion probabilistic models. arXiv preprint arXiv:2201.06503.
- A Coupled Flow Approach to Imitation Learning. arXiv preprint arXiv:2305.00303.
- Learning robust rewards with adversarial inverse reinforcement learning. arXiv preprint arXiv:1710.11248.
- Addressing function approximation error in actor-critic methods. In International conference on machine learning, 1587–1596. PMLR.
- Iq-learn: Inverse soft-q learning for imitation. Advances in Neural Information Processing Systems, 34: 4028–4039.
- A divergence minimization perspective on imitation learning methods. In Conference on Robot Learning, 1259–1277. PMLR.
- Generative adversarial nets. Advances in neural information processing systems, 27.
- Improved training of wasserstein gans. Advances in neural information processing systems, 30.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, 1861–1870. PMLR.
- Generative adversarial imitation learning. Advances in neural information processing systems, 29.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33: 6840–6851.
- Planning with diffusion for flexible behavior synthesis. arXiv preprint arXiv:2205.09991.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
- Discriminator-actor-critic: Addressing sample inefficiency and reward bias in adversarial imitation learning. arXiv preprint arXiv:1809.02925.
- Imitation learning via off-policy distribution matching. arXiv preprint arXiv:1912.05032.
- Rethinking ValueDice: Does It Really Improve Performance? arXiv preprint arXiv:2202.02468.
- Pseudo numerical methods for diffusion models on manifolds. arXiv preprint arXiv:2202.09778.
- Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003.
- How To Guide Your Learner: Imitation Learning with Active Adaptive Expert Involvement. arXiv preprint arXiv:2303.02073.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35: 5775–5787.
- Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
- Human-level control through deep reinforcement learning. nature, 518(7540): 529–533.
- Dualdice: Behavior-agnostic estimation of discounted stationary distribution corrections. Advances in Neural Information Processing Systems, 32.
- f-gan: Training generative neural samplers using variational divergence minimization. Advances in neural information processing systems, 29.
- Imitating human behaviour with diffusion models. arXiv preprint arXiv:2301.10677.
- Pomerleau, D. A. 1991. Efficient training of artificial neural networks for autonomous navigation. Neural computation, 3(1): 88–97.
- Goal-conditioned imitation learning using score-based diffusion policies. arXiv preprint arXiv:2304.02532.
- A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, 627–635. JMLR Workshop and Conference Proceedings.
- Trust region policy optimization. In International conference on machine learning, 1889–1897. PMLR.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- Mastering the game of Go with deep neural networks and tree search. nature, 529(7587): 484–489.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502.
- Consistency Models. arXiv preprint arXiv:2303.01469.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, 5026–5033. IEEE.
- Diffusion policies as an expressive policy class for offline reinforcement learning. arXiv preprint arXiv:2208.06193.
- Auto-Encoding Adversarial Imitation Learning.
- f-gail: Learning f-divergence for generative adversarial imitation learning. Advances in neural information processing systems, 33: 12805–12815.
- DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics. arXiv preprint arXiv:2310.13268.
- Off-policy imitation learning from observations. Advances in Neural Information Processing Systems, 33: 12402–12413.
- Bingzheng Wang (3 papers)
- Yan Zhang (954 papers)
- Teng Pang (3 papers)
- Guoqiang Wu (26 papers)
- Yilong Yin (47 papers)