Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

How to Leverage Diverse Demonstrations in Offline Imitation Learning (2405.17476v3)

Published 24 May 2024 in cs.LG and cs.AI

Abstract: Offline Imitation Learning (IL) with imperfect demonstrations has garnered increasing attention owing to the scarcity of expert data in many real-world domains. A fundamental problem in this scenario is how to extract positive behaviors from noisy data. In general, current approaches to the problem select data building on state-action similarity to given expert demonstrations, neglecting precious information in (potentially abundant) $\textit{diverse}$ state-actions that deviate from expert ones. In this paper, we introduce a simple yet effective data selection method that identifies positive behaviors based on their resultant states -- a more informative criterion enabling explicit utilization of dynamics information and effective extraction of both expert and beneficial diverse behaviors. Further, we devise a lightweight behavior cloning algorithm capable of leveraging the expert and selected data correctly. In the experiments, we evaluate our method on a suite of complex and high-dimensional offline IL benchmarks, including continuous-control and vision-based tasks. The results demonstrate that our method achieves state-of-the-art performance, outperforming existing methods on $\textbf{20/21}$ benchmarks, typically by $\textbf{2-5x}$, while maintaining a comparable runtime to Behavior Cloning ($\texttt{BC}$).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316, 2016.
  2. Scalable bayesian inverse reinforcement learning. In International Conference on Learning Representations, 2021.
  3. Mitigating covariate shift in imitation learning via offline data with partial coverage. In Advances in Neural Information Processing Systems, volume 34, pp.  965–979. Curran Associates, 2021.
  4. Get back here: Robust imitation by return-to-distribution planning. arXiv preprint arXiv:2305.01400, 2023.
  5. Implicit behavioral cloning. In Proceedings of the 5th Conference on Robot Learning, volume 164, pp.  158–168. PMLR, 2022.
  6. D4RL: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
  7. IQ-Learn: Inverse soft-q learning for imitation. In Advances in Neural Information Processing Systems, volume 34, pp.  4028–4039. Curran Associates, 2021.
  8. Generative adversarial nets. In Advances in Neural Information Processing Systems, volume 27, pp.  2672–2680. Curran Associates, 2014.
  9. Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. arXiv preprint arXiv:1910.11956, 2019.
  10. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning, volume 80, pp.  1861–1870. PMLR, 2018.
  11. Inverse reinforcement learning with simultaneous estimation of rewards and dynamics. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, volume 51, pp.  102–110. PMLR, 2016.
  12. Strictly batch imitation learning by energy-based distribution matching. In Advances in Neural Information Processing Systems, volume 33, pp.  7354–7365. Curran Associates, 2020.
  13. DemoDICE: Offline imitation learning with supplementary imperfect demonstrations. In International Conference on Learning Representations, 2022.
  14. Batch, off-policy and model-free apprenticeship learning. In Recent Advances in Reinforcement Learning, pp.  285–296. Springer Berlin Heidelberg, 2012a.
  15. Inverse reinforcement learning through structured classification. In Advances in Neural Information Processing Systems, volume 25, pp.  1007–1015. Curran Associates, 2012b.
  16. Imitation learning via off-policy distribution matching. In International Conference on Learning Representations, 2020.
  17. Truly batch apprenticeship learning with deep successor features. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp.  5909–5915, 2019.
  18. OptiDICE: Offline policy optimization via stationary distribution correction estimation. In Proceedings of the 38th International Conference on Machine Learning, volume 139, pp.  6120–6130. PMLR, 2021.
  19. Imitation learning from imperfection: Theoretical justifications and algorithms. In The 37th Conference on Neural Information Processing Systems, 2023.
  20. What matters in learning from offline human demonstrations for robot manipulation. In Conference on Robot Learning (CoRL), 2021.
  21. Cal-QL: Calibrated offline rl pre-training for efficient online fine-tuning. In The 37th Conference on Neural Information Processing Systems, 2023.
  22. Boosted and reward-regularized classification for apprenticeship learning. In Proceedings of the 13th International Conference on Autonomous Agents and Multi-Agent Systems, pp.  1249–1256, 2014.
  23. Pomerleau, D. A. ALVINN: An autonomous land vehicle in a neural network. In Advances in Neural Information Processing Systems, volume 1, pp.  305–313. Morgan Kaufmann, 1988.
  24. Toward the fundamental limits of imitation learning. In Advances in Neural Information Processing Systems, volume 33, pp.  2914–2924. Curran Associates, 2020.
  25. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087, 2017.
  26. Bridging offline reinforcement learning and imitation learning: A tale of pessimism. In Advances in Neural Information Processing Systems, volume 34, pp.  11702–11716. Curran Associates, 2021.
  27. Behavioral cloning from noisy demonstrations. In International Conference on Learning Representations, 2021.
  28. Reinforcement learning: An introduction. MIT press, 2018.
  29. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999.
  30. Of moments and matching: A game-theoretic framework for closing the imitation gap. In Proceedings of the 38th International Conference on Machine Learning, volume 139, pp.  10022–10032. PMLR, 2021.
  31. Coherent soft imitation learning. In The 37th Conference on Neural Information Processing Systems, 2023.
  32. Imitation learning from imperfect demonstration. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97, pp. 6818–6827. PMLR, 2019.
  33. Discriminator-weighted offline imitation learning from suboptimal demonstrations. In Proceedings of the 39th International Conference on Machine Learning, volume 162, pp.  24725–24742. PMLR, 2022.
  34. On generalization of adversarial imitation learning and beyond. arXiv preprint arXiv:2106.10424, 2021.
  35. How to leverage unlabeled data in offline reinforcement learning. In Proceedings of the 39th International Conference on Machine Learning, volume 162, pp.  25611–25635. PMLR, 2022.
  36. CLARE: Conservative model-based reward learning for offline inverse reinforcement learning. In International Conference on Learning Representations, 2023.
  37. Federated offline reinforcement learning with proximal policy evaluation. Chinese Journal of Electronics, 33(6):1–13, 2024.
  38. Maximum-likelihood inverse reinforcement learning with finite-time guarantees. In Advances in Neural Information Processing Systems, volume 35, pp.  10122–10135. Curran Associates, 2022.
  39. When demonstrations meet generative world models: A maximum likelihood framework for offline inverse reinforcement learning. In The 37th Conference on Neural Information Processing Systems, 2023.
  40. Offline learning from demonstrations and unlabeled experience. In NeurIPS Workshop on Offline Reinforcement Learning, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Sheng Yue (13 papers)
  2. Jiani Liu (17 papers)
  3. Xingyuan Hua (4 papers)
  4. Ju Ren (33 papers)
  5. Sen Lin (54 papers)
  6. Junshan Zhang (75 papers)
  7. Yaoxue Zhang (27 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com