Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Inverse Reinforcement Learning without Reinforcement Learning (2303.14623v4)

Published 26 Mar 2023 in cs.LG

Abstract: Inverse Reinforcement Learning (IRL) is a powerful set of techniques for imitation learning that aims to learn a reward function that rationalizes expert demonstrations. Unfortunately, traditional IRL methods suffer from a computational weakness: they require repeatedly solving a hard reinforcement learning (RL) problem as a subroutine. This is counter-intuitive from the viewpoint of reductions: we have reduced the easier problem of imitation learning to repeatedly solving the harder problem of RL. Another thread of work has proved that access to the side-information of the distribution of states where a strong policy spends time can dramatically reduce the sample and computational complexities of solving an RL problem. In this work, we demonstrate for the first time a more informed imitation learning reduction where we utilize the state distribution of the expert to alleviate the global exploration component of the RL subroutine, providing an exponential speedup in theory. In practice, we find that we are able to significantly speed up the prior art on continuous control tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Rudolf Emil Kalman. When is a linear control system optimal? 1964.
  2. J Andrew Bagnell. An invitation to imitation. Technical report, Carnegie-Mellon Univ Pittsburgh Pa Robotics Inst, 2015.
  3. John Rust. Structural estimation of markov decision processes. Handbook of econometrics, 4:3081–3143, 1994.
  4. Eadweard Muybridge. Animal locomotion, volume 534. Da Capo Press, 1887.
  5. An internal model for sensorimotor integration. Science, 269(5232):1880–1882, 1995.
  6. Action understanding as inverse planning. Cognition, 113(3):329–349, 2009.
  7. Maximum entropy inverse reinforcement learning. In Aaai, volume 8, pages 1433–1438. Chicago, IL, USA, 2008a.
  8. Activity forecasting. In European conference on computer vision, pages 201–214. Springer, 2012.
  9. Learning from demonstration for autonomous navigation in complex unstructured terrain. The International Journal of Robotics Research, 29(12):1565–1592, 2010.
  10. Learning to search: Functional gradient techniques for imitation learning. Autonomous Robots, 27(1):25–53, 2009.
  11. A control architecture for quadruped locomotion over rough terrain. In 2008 IEEE International Conference on Robotics and Automation, pages 811–818. IEEE, 2008.
  12. Autonomous inverted helicopter flight via reinforcement learning. In Experimental robotics IX, pages 363–372. Springer, 2006.
  13. Optimization and learning for rough terrain legged locomotion. The International Journal of Robotics Research, 30(2):175–191, 2011.
  14. Navigate like a cabbie: Probabilistic reasoning from observed context-aware behavior. In Proceedings of the 10th international conference on Ubiquitous computing, pages 322–331, 2008b.
  15. Probabilistic pointing target prediction via inverse optimal control. In Proceedings of the 2012 ACM international conference on Intelligent User Interfaces, pages 1–10, 2012.
  16. Algorithms for inverse reinforcement learning. In Icml, volume 1, page 2, 2000.
  17. Of moments and matching: A game-theoretic framework for closing the imitation gap, 2021. URL https://arxiv.org/abs/2103.03236.
  18. Hierarchical model-based imitation learning for planning in autonomous driving. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 8652–8659. IEEE, 2022.
  19. Symphony: Learning realistic and diverse agents for autonomous driving simulation. arXiv preprint arXiv:2205.03195, 2022.
  20. Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world. arXiv preprint arXiv:2206.09889, 2022.
  21. Generative adversarial imitation learning, 2016.
  22. Guided cost learning: Deep inverse optimal control via policy optimization. In International conference on machine learning, pages 49–58. PMLR, 2016.
  23. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
  24. Guided policy search. In International conference on machine learning, pages 1–9. PMLR, 2013.
  25. Tutorial summary: Reductions in machine learning. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 1–1, 2009.
  26. Approximately optimal approximate reinforcement learning. In In Proc. 19th International Conference on Machine Learning. Citeseer, 2002.
  27. Policy search by dynamic programming. Advances in neural information processing systems, 16, 2003.
  28. A reduction of imitation learning and structured prediction to no-regret online learning, 2010. URL https://arxiv.org/abs/1011.0686.
  29. Jump-start reinforcement learning. arXiv preprint arXiv:2204.02372, 2022.
  30. On the theory of policy gradient methods: Optimality, approximation, and distribution shift. J. Mach. Learn. Res., 22(98):1–76, 2021a.
  31. Minimax optimal online imitation learning via replay estimation. arXiv preprint arXiv:2205.15397, 2022.
  32. Provably efficient imitation learning from observation alone. In International conference on machine learning, pages 6036–6045. PMLR, 2019.
  33. Modeling strong and human-like gameplay with kl-regularized search. In International Conference on Machine Learning, pages 9695–9728. PMLR, 2022.
  34. Regularized rl. arXiv preprint arXiv:2310.17303, 2023.
  35. Inverse optimal control with linearly-solvable mdps. In ICML, 2010.
  36. Iq-learn: Inverse soft-q learning for imitation. Advances in Neural Information Processing Systems, 34:4028–4039, 2021.
  37. Sqil: Imitation learning via reinforcement learning with sparse rewards. arXiv preprint arXiv:1905.11108, 2019.
  38. A game-theoretic approach to apprenticeship learning. Advances in neural information processing systems, 20, 2007.
  39. Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
  40. Brendan McMahan. Follow-the-regularized-leader and mirror descent: Equivalence theorems and l1 regularization. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 525–533. JMLR Workshop and Conference Proceedings, 2011.
  41. The multiplicative weights update method: a meta-algorithm and applications. Theory of computing, 8(1):121–164, 2012.
  42. Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (icml-03), pages 928–936, 2003.
  43. Sham Machandranath Kakade. On the sample complexity of reinforcement learning. University of London, University College London (United Kingdom), 2003.
  44. On integral probability metrics,\\\backslash\phi-divergences and binary classification. arXiv preprint arXiv:0901.2698, 2009.
  45. Elad Hazan. Introduction to online convex optimization, 2019. URL https://arxiv.org/abs/1909.05207.
  46. Reinforcement learning: An introduction. 2018.
  47. A theory of learning from different domains. Machine learning, 79(1):151–175, 2010.
  48. Learning to search better than your teacher. In International Conference on Machine Learning, pages 2058–2066. PMLR, 2015.
  49. Pybullet, a python module for physics simulation for games, robotics and machine learning. 2016.
  50. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
  51. Deep reinforcement learning at the edge of the statistical precipice. Advances in Neural Information Processing Systems, 2021b.
  52. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905, 2018.
  53. A minimalist approach to offline reinforcement learning. Advances in neural information processing systems, 34:20132–20145, 2021.
  54. Improved training of wasserstein gans. Advances in neural information processing systems, 30, 2017.
  55. Training gans with optimism. arXiv preprint arXiv:1711.00141, 2017.
  56. Stable baselines3, 2019.
  57. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
  58. Deep q-learning from demonstrations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
Citations (27)

Summary

We haven't generated a summary for this paper yet.