How to Leverage Diverse Demonstrations in Offline Imitation Learning (2405.17476v3)
Abstract: Offline Imitation Learning (IL) with imperfect demonstrations has garnered increasing attention owing to the scarcity of expert data in many real-world domains. A fundamental problem in this scenario is how to extract positive behaviors from noisy data. In general, current approaches to the problem select data building on state-action similarity to given expert demonstrations, neglecting precious information in (potentially abundant) $\textit{diverse}$ state-actions that deviate from expert ones. In this paper, we introduce a simple yet effective data selection method that identifies positive behaviors based on their resultant states -- a more informative criterion enabling explicit utilization of dynamics information and effective extraction of both expert and beneficial diverse behaviors. Further, we devise a lightweight behavior cloning algorithm capable of leveraging the expert and selected data correctly. In the experiments, we evaluate our method on a suite of complex and high-dimensional offline IL benchmarks, including continuous-control and vision-based tasks. The results demonstrate that our method achieves state-of-the-art performance, outperforming existing methods on $\textbf{20/21}$ benchmarks, typically by $\textbf{2-5x}$, while maintaining a comparable runtime to Behavior Cloning ($\texttt{BC}$).
- End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316, 2016.
- Scalable bayesian inverse reinforcement learning. In International Conference on Learning Representations, 2021.
- Mitigating covariate shift in imitation learning via offline data with partial coverage. In Advances in Neural Information Processing Systems, volume 34, pp. 965–979. Curran Associates, 2021.
- Get back here: Robust imitation by return-to-distribution planning. arXiv preprint arXiv:2305.01400, 2023.
- Implicit behavioral cloning. In Proceedings of the 5th Conference on Robot Learning, volume 164, pp. 158–168. PMLR, 2022.
- D4RL: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
- IQ-Learn: Inverse soft-q learning for imitation. In Advances in Neural Information Processing Systems, volume 34, pp. 4028–4039. Curran Associates, 2021.
- Generative adversarial nets. In Advances in Neural Information Processing Systems, volume 27, pp. 2672–2680. Curran Associates, 2014.
- Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. arXiv preprint arXiv:1910.11956, 2019.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning, volume 80, pp. 1861–1870. PMLR, 2018.
- Inverse reinforcement learning with simultaneous estimation of rewards and dynamics. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, volume 51, pp. 102–110. PMLR, 2016.
- Strictly batch imitation learning by energy-based distribution matching. In Advances in Neural Information Processing Systems, volume 33, pp. 7354–7365. Curran Associates, 2020.
- DemoDICE: Offline imitation learning with supplementary imperfect demonstrations. In International Conference on Learning Representations, 2022.
- Batch, off-policy and model-free apprenticeship learning. In Recent Advances in Reinforcement Learning, pp. 285–296. Springer Berlin Heidelberg, 2012a.
- Inverse reinforcement learning through structured classification. In Advances in Neural Information Processing Systems, volume 25, pp. 1007–1015. Curran Associates, 2012b.
- Imitation learning via off-policy distribution matching. In International Conference on Learning Representations, 2020.
- Truly batch apprenticeship learning with deep successor features. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 5909–5915, 2019.
- OptiDICE: Offline policy optimization via stationary distribution correction estimation. In Proceedings of the 38th International Conference on Machine Learning, volume 139, pp. 6120–6130. PMLR, 2021.
- Imitation learning from imperfection: Theoretical justifications and algorithms. In The 37th Conference on Neural Information Processing Systems, 2023.
- What matters in learning from offline human demonstrations for robot manipulation. In Conference on Robot Learning (CoRL), 2021.
- Cal-QL: Calibrated offline rl pre-training for efficient online fine-tuning. In The 37th Conference on Neural Information Processing Systems, 2023.
- Boosted and reward-regularized classification for apprenticeship learning. In Proceedings of the 13th International Conference on Autonomous Agents and Multi-Agent Systems, pp. 1249–1256, 2014.
- Pomerleau, D. A. ALVINN: An autonomous land vehicle in a neural network. In Advances in Neural Information Processing Systems, volume 1, pp. 305–313. Morgan Kaufmann, 1988.
- Toward the fundamental limits of imitation learning. In Advances in Neural Information Processing Systems, volume 33, pp. 2914–2924. Curran Associates, 2020.
- Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087, 2017.
- Bridging offline reinforcement learning and imitation learning: A tale of pessimism. In Advances in Neural Information Processing Systems, volume 34, pp. 11702–11716. Curran Associates, 2021.
- Behavioral cloning from noisy demonstrations. In International Conference on Learning Representations, 2021.
- Reinforcement learning: An introduction. MIT press, 2018.
- Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999.
- Of moments and matching: A game-theoretic framework for closing the imitation gap. In Proceedings of the 38th International Conference on Machine Learning, volume 139, pp. 10022–10032. PMLR, 2021.
- Coherent soft imitation learning. In The 37th Conference on Neural Information Processing Systems, 2023.
- Imitation learning from imperfect demonstration. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97, pp. 6818–6827. PMLR, 2019.
- Discriminator-weighted offline imitation learning from suboptimal demonstrations. In Proceedings of the 39th International Conference on Machine Learning, volume 162, pp. 24725–24742. PMLR, 2022.
- On generalization of adversarial imitation learning and beyond. arXiv preprint arXiv:2106.10424, 2021.
- How to leverage unlabeled data in offline reinforcement learning. In Proceedings of the 39th International Conference on Machine Learning, volume 162, pp. 25611–25635. PMLR, 2022.
- CLARE: Conservative model-based reward learning for offline inverse reinforcement learning. In International Conference on Learning Representations, 2023.
- Federated offline reinforcement learning with proximal policy evaluation. Chinese Journal of Electronics, 33(6):1–13, 2024.
- Maximum-likelihood inverse reinforcement learning with finite-time guarantees. In Advances in Neural Information Processing Systems, volume 35, pp. 10122–10135. Curran Associates, 2022.
- When demonstrations meet generative world models: A maximum likelihood framework for offline inverse reinforcement learning. In The 37th Conference on Neural Information Processing Systems, 2023.
- Offline learning from demonstrations and unlabeled experience. In NeurIPS Workshop on Offline Reinforcement Learning, 2020.
- Sheng Yue (13 papers)
- Jiani Liu (17 papers)
- Xingyuan Hua (4 papers)
- Ju Ren (33 papers)
- Sen Lin (54 papers)
- Junshan Zhang (75 papers)
- Yaoxue Zhang (27 papers)