Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Pragmatic Look at Deep Imitation Learning (2108.01867v2)

Published 4 Aug 2021 in cs.LG, cs.NE, and stat.ML

Abstract: The introduction of the generative adversarial imitation learning (GAIL) algorithm has spurred the development of scalable imitation learning approaches using deep neural networks. Many of the algorithms that followed used a similar procedure, combining on-policy actor-critic algorithms with inverse reinforcement learning. More recently there have been an even larger breadth of approaches, most of which use off-policy algorithms. However, with the breadth of algorithms, everything from datasets to base reinforcement learning algorithms to evaluation settings can vary, making it difficult to fairly compare them. In this work we re-implement 6 different IL algorithms, updating 3 of them to be off-policy, base them on a common off-policy algorithm (SAC), and evaluate them on a widely-used expert trajectory dataset (D4RL) for the most common benchmark (MuJoCo). After giving all algorithms the same hyperparameter optimisation budget, we compare their results for a range of expert trajectories. In summary, GAIL, with all of its improvements, consistently performs well across a range of sample sizes, AdRIL is a simple contender that performs well with one important hyperparameter to tune, and behavioural cloning remains a strong baseline when data is more plentiful.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Deep Reinforcement Learning at the Edge of the Statistical Precipice. In NeurIPS, 2021.
  2. What Matters in On-policy Reinforcement Learning? A Large-scale Empirical Study. In ICLR, 2021.
  3. A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress. Artif. Intell., 297:103500, 2021.
  4. Deep Reinforcement Learning: A Brief Survey. IEEE SPM, 34(6):26–38, 2017.
  5. BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization. In NeurIPS, 2020.
  6. Sample-efficient Imitation Learning via Generative Adversarial Nets. In AISTATS, 2019.
  7. Lipschitzness is All You Need to Tame Off-policy Generative Adversarial Imitation Learning. Mach. Learn., 111(4):1431–1521, 2022.
  8. Disagreement-regularized Imitation Learning. In ICLR, 2020.
  9. OpenAI Gym. arXiv:1606.01540, 2016.
  10. Exploration by Random Network Distillation. In ICLR, 2018.
  11. Batch Exploration with Examples for Scalable Robotic Reinforcement Learning. IEEE RA-L, 6(3):4401–4408, 2021.
  12. Primal Wasserstein Imitation Learning. In ICLR, 2021.
  13. Search-based Structured Prediction. Mach. Learn., 75(3):297–325, 2009.
  14. Training Generative Neural Networks via Maximum Mean Discrepancy Optimization. In UAI, 2015.
  15. Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO. In ICLR, 2020.
  16. A Connection Between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-based Models. arXiv:1611.03852, 2016.
  17. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning. In ICLR, 2018.
  18. D4RL: Datasets for Deep Data-driven Reinforcement Learning. arXiv:2004.07219, 2020.
  19. Addressing Function Approximation Error in Actor-critic Methods. In ICML, 2018.
  20. A Divergence Minimization Perspective on Imitation Learning Methods. In CoRL, 2020.
  21. Generative Adversarial Networks. In NeurIPS, 2014.
  22. A Kernel Two-sample Test. JMLR, 13(1):723–773, 2012.
  23. Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In ICML, 2018a.
  24. Soft Actor-critic Algorithms and Applications. arXiv:1812.05905, 2018b.
  25. Deep Reinforcement Learning that Matters. In AAAI, 2018.
  26. Generative Adversarial Imitation Learning. In NeurIPS, 2016.
  27. Imitation Learning: A Survey of Learning Methods. ACM CSUR, 50(2):1–35, 2017.
  28. Hyperparameter Selection for Imitation Learning. In ICML, 2021.
  29. Edwin T Jaynes. Information Theory and Statistical Mechanics. Phys. Rev., 106(4):620, 1957.
  30. Addressing Reward Bias in Adversarial Imitation Learning with Neutral Reward Functions. In Deep RL Workshop, NeurIPS, 2020.
  31. Imitation Learning via Kernel Mean Embedding. In AAAI, 2018.
  32. Discriminator-actor-critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning. In ICLR, 2019.
  33. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. In NeurIPS, 2017.
  34. Generative Moment Matching Networks. In ICML, 2015.
  35. Long-Ji Lin. Self-improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching. Mach. Learn., 8(3-4):293–321, 1992.
  36. Decoupled Weight Decay Regularization. In ICLR, 2019.
  37. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents. JAIR, 61:523–562, 2018.
  38. Alfred Müller. Integral Probability Metrics and Their Generating Classes of Functions. Adv. Appl. Probab., 29(2):429–443, 1997.
  39. A Metric Learning Reality Check. In ECCV, 2020.
  40. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping. In ICML, 1999.
  41. Algorithms for Inverse Reinforcement Learning. In ICML, 2000.
  42. Realistic Evaluation of Deep Semi-supervised Learning Algorithms. In NeurIPS, 2018.
  43. What Matters for Adversarial Imitation Learning? In NeurIPS, 2021.
  44. Time Limits in Reinforcement Learning. In ICML, 2018.
  45. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS, 2019.
  46. Boosted and Reward-regularized Classification for Apprenticeship Learning. In AAMAS, 2014.
  47. Dean A Pomerleau. ALVINN: An Autonomous Land Vehicle in a Neural Network. In NeurIPS, 1988.
  48. Stable-baselines3: Reliable Reinforcement Learning Implementations. JMLR, 22(1):12348–12355, 2021.
  49. SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards. In ICLR, 2020.
  50. Efficient Reductions for Imitation Learning. In AISTATS, 2010.
  51. A Reduction of Imitation Learning and Structured Prediction to No-regret Online Learning. In AISTATS, 2011.
  52. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. JMLR, 15(1):1929–1958, 2014.
  53. Reinforcement Learning: An Introduction. MIT Press, 2018.
  54. Of Moments and Matching: A Game-theoretic Framework for Closing the Imitation Gap. In ICML, 2021.
  55. Apprenticeship Learning using Linear Programming. In ICML, 2008.
  56. MuJoCo: A Physics Engine for Model-based Control. In IROS, 2012.
  57. Cédric Villani. Optimal Transport: Old and New. Springer, 2009.
  58. Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation. In ICML, 2019.
  59. Function Optimization using Connectionist Reinforcement Learning Algorithms. Conn. Sci., 3(3):241–268, 1991.
  60. Positive-unlabeled Reward Learning. In CoRL, 2021.
  61. Mind the Class Weight Bias: Weighted Maximum Mean Discrepancy for Unsupervised Domain Adaptation. In CVPR, 2017.
  62. Maximum Entropy Inverse Reinforcement Learning. In AAAI, 2008.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Kai Arulkumaran (23 papers)
  2. Dan Ogawa Lillrank (1 paper)
Citations (8)

Summary

We haven't generated a summary for this paper yet.