Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning from Guided Play: Improving Exploration for Adversarial Imitation Learning with Simple Auxiliary Tasks (2301.00051v2)

Published 30 Dec 2022 in cs.LG, cs.AI, and cs.RO

Abstract: Adversarial imitation learning (AIL) has become a popular alternative to supervised imitation learning that reduces the distribution shift suffered by the latter. However, AIL requires effective exploration during an online reinforcement learning phase. In this work, we show that the standard, naive approach to exploration can manifest as a suboptimal local maximum if a policy learned with AIL sufficiently matches the expert distribution without fully learning the desired task. This can be particularly catastrophic for manipulation tasks, where the difference between an expert and a non-expert state-action pair is often subtle. We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of multiple exploratory, auxiliary tasks in addition to a main task. The addition of these auxiliary tasks forces the agent to explore states and actions that standard AIL may learn to ignore. Additionally, this particular formulation allows for the reusability of expert data between main tasks. Our experimental results in a challenging multitask robotic manipulation domain indicate that LfGP significantly outperforms both AIL and behaviour cloning, while also being more expert sample efficient than these baselines. To explain this performance gap, we provide further analysis of a toy problem that highlights the coupling between a local maximum and poor exploration, and also visualize the differences between the learned models from AIL and LfGP.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. M. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton, and R. Munos, “Unifying Count-Based Exploration and Intrinsic Motivation,” in Conf. Neural Inf. Processing Systems, vol. 29, Dec. 2016.
  2. A. Nair, B. McGrew, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Overcoming Exploration in Reinforcement Learning with Demonstrations,” in Proc. 2018 IEEE Int. Conf. Robotics and Automation (ICRA’18), Brisbane, Australia, May 2018, pp. 6292–6299.
  3. A. Y. Ng and M. I. Jordan, “Shaping and policy search in reinforcement learning,” Ph.D. dissertation, University of California, Berkeley, 2003.
  4. A. Ng and S. Russell, “Algorithms for inverse reinforcement learning,” in Int. Conf. Machine Learning (ICML’00), July 2000, pp. 663–670.
  5. J. Ho and S. Ermon, “Generative Adversarial Imitation Learning,” in Conf. Neural Inf. Processing Systems, Barcelona, Spain, Dec. 5–11 2016, pp. 4565–4573.
  6. I. Kostrikov, K. K. Agrawal, D. Dwibedi, S. Levine, and J. Tompson, “Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning,” in Proc. Int. Conf. Learning Representations (ICLR’19), New Orleans, USA, May 2019.
  7. J. Fu, K. Luo, and S. Levine, “Learning Robust Rewards with Adverserial inverse Reinforcement Learning,” in Proc. Int. Conf. Learning Representations (ICLR’18), Vancouver, Canada, Apr. 30–May 3 2018.
  8. M. Orsini, et al., “What Matters for Adversarial Imitation Learning?” in Conf. Neural Inf. Processing Systems, June 2021.
  9. A. Ecoffet, J. Huizinga, J. Lehman, K. O. Stanley, and J. Clune, “First return, then explore,” Nature, vol. 590, no. 7847, pp. 580–586, Feb. 2021.
  10. T. Ablett, B. Chan, and J. Kelly, “Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning,” in Proc. Neural Inf. Processing Systems (NeurIPS’21) Deep Reinforcement Learning Workshop, Dec. 2021.
  11. M. Riedmiller, et al., “Learning by Playing Solving Sparse Reward Tasks from Scratch,” in Proc. 35th Int. Conf. Machine Learning (ICML’18), Stockholm, Sweden, July 2018, pp. 4344–4353.
  12. C. Lynch, et al., “Learning Latent Plans from Play,” in Conf. Robot Learning (CoRL’19), 2019.
  13. A. Gupta, V. Kumar, C. Lynch, S. Levine, and K. Hausman, “Relay Policy Learning: Solving Long Horizon Tasks Via Imitation and Reinforcement Learning,” in Conf. Robot Learning (CoRL’19), 2019.
  14. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,” in Proc. 35th Int. Conf. Machine Learning (ICML’18), Stockholm, Sweden, July 2018, pp. 1861–1870.
  15. M. Vecerik, et al., “Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards,” Oct. 2018.
  16. D. Kalashnikov, et al., “QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation,” arXiv:1806.10293 [cs, stat], June 2018.
  17. A. Mandlekar, et al., “What Matters in Learning from Offline Human Demonstrations for Robot Manipulation,” in Conf. Robot Learning, Nov. 2021.
  18. L. Hussenot, et al., “Hyperparameter Selection for Imitation Learning,” in Proc. 38th Int. Conf. Machine Learning (ICML’21), July 2021, pp. 4511–4522.
  19. J. Fu, A. Singh, D. Ghosh, L. Yang, and S. Levine, “Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition,” in Conf. Neural Inf. Processing Systems, Montreal, Canada, Dec. 2018.
  20. S. Cabi, et al., “The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously,” in Conf. Robot Learning (CoRL’17), Mountain View, USA, Nov. 2017.
  21. K. Zolna, et al., “Task-Relevant Adversarial Imitation Learning,” in Proc. 2020 Conf. Robot Learning, Oct. 2021, pp. 247–263.
  22. S. Ross, G. J. Gordon, and D. Bagnell, “A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning,” in Proc. 14th Int. Conf. Artificial Intelligence and Statistics (AISTATS’11), Fort Lauderdale, USA, Apr. 2011, pp. 627–635.
  23. T. Ablett, Y. Zhai, and J. Kelly, “Seeing All the Angles: Learning Multiview Manipulation Policies for Contact-Rich Tasks from Demonstrations,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS’21), Prague, Czech Republic, Sep. 2021.
  24. P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse reinforcement learning,” in Int. Conf. Machine Learning (ICML’04).   Banff, Canada: ACM Press, 2004.
  25. T. Ablett, F. Marić, and J. Kelly, “Fighting Failures with FIRE: Failure Identification to Reduce Expert Burden in Intervention-Based Learning,” arXiv:2007.00245 [cs], Aug. 2020.
  26. K. Hausman, Y. Chebotar, S. Schaal, G. Sukhatme, and J. Lim, “Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets,” in Conf. Neural Inf. Processing Systems, May 2017.
  27. R. S. Sutton, D. Precup, and S. Singh, “Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning,” Artificial Intelligence, vol. 112, no. 1-2, pp. 181–211, Aug. 1999.
  28. O. Nachum, H. Tang, X. Lu, S. Gu, H. Lee, and S. Levine, “Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?” in Proc. Neural Inf. Processing Systems (NeurIPS’19) Deep Reinforcement Learning Workshop, Sep. 2019.
  29. P. Henderson, W.-D. Chang, P.-L. Bacon, D. Meger, J. Pineau, and D. Precup, “OptionGAN: Learning Joint Reward-Policy Options Using Generative Adversarial Inverse Reinforcement Learning,” in Proc. AAAI Conf. Artificial Intelligence (AAAI’18), no. 1, Apr. 2018.
  30. M. Sharma, A. Sharma, N. Rhinehart, and K. M. Kitani, “Directed-Info GAIL: Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information,” in Int. Conf. Learning Representations (ICLR’19), May 2019.
  31. M. Jing, et al., “Adversarial Option-Aware Hierarchical Imitation Learning,” in Proc. 38th Int. Conf. Machine Learning (ICML’21), July 2021, pp. 5097–5106.
  32. F. Codevilla, M. Müller, A. López, V. Koltun, and A. Dosovitskiy, “End-to-End Driving Via Conditional Imitation Learning,” in Proc. IEEE Int. Conf. Robotics and Automation (ICRA’18), Brisbane, Australia, May 21–25 2018, pp. 4693–4700.
  33. S. Pateria, B. Subagdja, A.-h. Tan, and C. Quek, “Hierarchical Reinforcement Learning: A Comprehensive Survey,” ACM Computing Surveys, vol. 54, no. 5, pp. 109:1–109:35, June 2021.
  34. A. Gupta, et al., “Reset-Free Reinforcement Learning via Multi-Task Learning: Learning Dexterous Manipulation Behaviors without Human Intervention,” in Proc. 2021 IEEE Int. Conf. Robotics and Automation (ICRA’21), Apr. 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Trevor Ablett (11 papers)
  2. Bryan Chan (11 papers)
  3. Jonathan Kelly (84 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.