Imitation Learning from Purified Demonstrations (2310.07143v2)
Abstract: Imitation learning has emerged as a promising approach for addressing sequential decision-making problems, with the assumption that expert demonstrations are optimal. However, in real-world scenarios, most demonstrations are often imperfect, leading to challenges in the effectiveness of imitation learning. While existing research has focused on optimizing with imperfect demonstrations, the training typically requires a certain proportion of optimal demonstrations to guarantee performance. To tackle these problems, we propose to purify the potential noises in imperfect demonstrations first, and subsequently conduct imitation learning from these purified demonstrations. Motivated by the success of diffusion model, we introduce a two-step purification via diffusion process. In the first step, we apply a forward diffusion process to smooth potential noises in imperfect demonstrations by introducing additional noise. Subsequently, a reverse generative process is utilized to recover the optimal demonstration from the diffused ones. We provide theoretical evidence supporting our approach, demonstrating that the distance between the purified and optimal demonstration can be bounded. Empirical results on MuJoCo and RoboSuite demonstrate the effectiveness of our method from different aspects.
- Reinforcement learning: An introduction. MIT press, 2018.
- Reinforcement learning: A survey. Journal of artificial intelligence research, 4:237–285, 1996.
- Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
- Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
- Douzero: Mastering doudizhu with self-play deep reinforcement learning. In International Conference on Machine Learning, pages 12333–12344. PMLR, 2021.
- Concrete problems in ai safety. arXiv preprint arXiv:1606.06565, 2016.
- Challenges of real-world reinforcement learning. arXiv preprint arXiv:1904.12901, 2019.
- Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning, page 1. ACM, 2004.
- Imitation learning: A survey of learning methods. ACM Computing Surveys (CSUR), 50(2):1–35, 2017.
- Imitation learning from pixel-level demonstrations by hashreward. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pages 279–287, 2021.
- Behavioral cloning from observation. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, pages 4950–4957, 2018.
- Behavioral cloning from noisy demonstrations. In International Conference on Learning Representations, 2021.
- Error bounds of imitating policies and environments. Advances in Neural Information Processing Systems, 33:15737–15749, 2020.
- Error bounds of imitating policies and environments for reinforcement learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6968–6980, 2021.
- Generative adversarial imitation learning. In Advances in neural information processing systems, pages 4565–4573, 2016.
- Learning robust rewards with adversarial inverse reinforcement learning. arXiv preprint arXiv:1710.11248, 2017.
- Primal wasserstein imitation learning. In International Conference on Learning Representations, 2020.
- Seeing differently, acting similarly: Heterogeneously observable imitation learning. arXiv preprint arXiv:2106.09256, 2021.
- Unlabeled imperfect demonstrations in adversarial imitation learning. arXiv preprint arXiv:2302.06271, 2023.
- Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
- Learning to weight imperfect demonstrations. In International Conference on Machine Learning, pages 10961–10970. PMLR, 2021.
- Robust adversarial imitation learning via adaptively-selected demonstrations. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, pages 3155–3161, 2021.
- Confidence-aware imitation learning from demonstrations with varying optimality. Advances in Neural Information Processing Systems, 34:12340–12350, 2021.
- Imitation learning by estimating expertise of demonstrators. In International Conference on Machine Learning, pages 1732–1748. PMLR, 2022.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE, 2012.
- Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics. Wiley, 1994.
- f-gan: Training generative neural samplers using variational divergence minimization. Advances in neural information processing systems, 29, 2016.
- Imitation learning as f𝑓fitalic_f-divergence minimization. arXiv preprint arXiv:1905.12888, 2019.
- Diffusion models: A comprehensive survey of methods and applications. arXiv preprint arXiv:2209.00796, 2022.
- Learning to schedule in diffusion probabilistic models. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2478–2488, 2023.
- Diffusion models for adversarial purification. arXiv preprint arXiv:2205.07460, 2022.
- Card: Classification and regression diffusion models. Advances in Neural Information Processing Systems, 35:18100–18115, 2022.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- Openai gym, 2016.
- Discriminator-weighted offline imitation learning from suboptimal demonstrations. In International Conference on Machine Learning, pages 24725–24742. PMLR, 2022.
- Better-than-demonstrator imitation learning via automatically-ranked demonstrations. In Conference on Robot Learning, pages 330–359, 2020.
- Imitation learning from imperfect demonstration. arXiv preprint arXiv:1901.09387, 2019.
- Trust region policy optimization. In International conference on machine learning, pages 1889–1897, 2015.
- Demodice: Offline imitation learning with supplementary imperfect demonstrations. In International Conference on Learning Representations, 2021.
- Offline imitation learning with suboptimal demonstrations via relaxed distribution matching. arXiv preprint arXiv:2303.02569, 2023.
- Theoretical analysis of offline imitation with supplementary dataset. arXiv preprint arXiv:2301.11687, 2023.
- Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems, pages 4299–4307, 2017.
- Reward learning from human preferences and demonstrations in atari. Advances in neural information processing systems, 31, 2018.
- Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations. arXiv preprint arXiv:1904.06387, 2019.
- Learning from suboptimal demonstration via self-supervised reward regression. In Conference on Robot Learning, pages 1262–1277. PMLR, 2021.
- Maximum likelihood training of score-based diffusion models. Advances in Neural Information Processing Systems, 34:1415–1428, 2021.
- Applied stochastic differential equations, volume 10. Cambridge University Press, 2019.
- Vild: Variational imitation learning with diverse-quality demonstrations. arXiv preprint arXiv:1909.06769, 2019.
- Yunke Wang (11 papers)
- Minjing Dong (28 papers)
- Bo Du (263 papers)
- Chang Xu (323 papers)
- Yukun Zhao (13 papers)