Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rethinking Adversarial Inverse Reinforcement Learning: Policy Imitation, Transferable Reward Recovery and Algebraic Equilibrium Proof (2403.14593v4)

Published 21 Mar 2024 in cs.LG and stat.ML

Abstract: Adversarial inverse reinforcement learning (AIRL) stands as a cornerstone approach in imitation learning, yet it faces criticisms from prior studies. In this paper, we rethink AIRL and respond to these criticisms. Criticism 1 lies in Inadequate Policy Imitation. We show that substituting the built-in algorithm with soft actor-critic (SAC) during policy updating (requires multi-iterations) significantly enhances the efficiency of policy imitation. Criticism 2 lies in Limited Performance in Transferable Reward Recovery Despite SAC Integration. While we find that SAC indeed exhibits a significant improvement in policy imitation, it introduces drawbacks to transferable reward recovery. We prove that the SAC algorithm itself is not feasible to disentangle the reward function comprehensively during the AIRL training process, and propose a hybrid framework, PPO-AIRL + SAC, for a satisfactory transfer effect. Criticism 3 lies in Unsatisfactory Proof from the Perspective of Potential Equilibrium. We reanalyze it from an algebraic theory perspective.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Dean A Pomerleau. Efficient training of artificial neural networks for autonomous navigation. Neural Computation, 3(1):88–97, 1991.
  2. Algorithms for inverse reinforcement learning. In International Conference on Machine Learning, pages 663–670, 2000.
  3. A game-theoretic approach to apprenticeship learning. In Advances in Neural Information Processing Systems, volume 20, pages 1449–1456, 2007.
  4. Generative adversarial imitation learning. In Advances in Neural Information Processing Systems, volume 29, pages 4565–4573, 2016.
  5. Martin L Puterman. Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons, 2014.
  6. Reinforcement learning: An introduction. Cambridge, MA, USA: MIT Press, 2018.
  7. Multi-agent imitation learning for driving simulation. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1534–1539. IEEE, 2018.
  8. Mohamed Khalil Jabri. Robot manipulation learning using generative adversarial imitation learning. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, pages 4893–4894, 2021.
  9. Virtual-taobao: Virtualizing real-world online retail environment for reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 4902–4909, 2019.
  10. Policy invariance under reward transformations: Theory and application to reward shaping. In International Conference on Machine Learning, pages 278–287, 1999.
  11. Learning robust rewards with adversarial inverse reinforcement learning. In International Conference on Learning Representations (ICLR). Preprint retrieved from arXiv:1710.11248, 2018.
  12. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905, 2018.
  13. Trust region policy optimization. In International Conference on Machine Learning, pages 1889–1897, 2015.
  14. Generalization and computation for policy classes of generative adversarial imitation learning. In International Conference on Parallel Problem Solving from Nature, pages 385–399. Springer, 2022.
  15. Distributional generative adversarial imitation learning with reproducing kernel generalization. Neural Networks, 165:43–59, 2023.
  16. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, pages 1861–1870, 2018.
  17. Reward identification in inverse reinforcement learning. In International Conference on Machine Learning, pages 5496–5505, 2021.
  18. Identifiability in inverse reinforcement learning. In Advances in Neural Information Processing Systems, volume 34, pages 12362–12373, 2021.
  19. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  20. Discriminator-actor-critic: Addressing sample inefficiency and reward bias in adversarial imitation learning. In International Conference on Learning Representations (ICLR). Preprint retrieved from arXiv:1809.02925, 2019.
  21. Imitation learning via off-policy distribution matching. In International Conference on Learning Representations (ICLR). Preprint retrieved from arXiv:1912.05032, 2020.
  22. A coupled flow approach to imitation learning. In International Conference on Machine Learning, pages 10357–10372, 2023.
  23. Receding horizon inverse reinforcement learning. In Advances in Neural Information Processing Systems, volume 35, pages 27880–27892, 2022.
  24. Generative adversarial nets. In Advances in Neural Information Processing Systems, volume 27, pages 2672––2680, 2014.
  25. DSAC: Distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:2004.14547, 2020.
  26. Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE Transactions on Neural Networks and Learning Systems, 33(11):6584–6598, 2022.
  27. A framework for behavioural cloning. Machine Intelligence 15, pages 103–129, 1995.
  28. Adversarial imitation via variational inverse reinforcement learning. In International Conference on Learning Representations (ICLR). Preprint retrieved from arXiv:1809.06404, 2019.
  29. OPIRL: Sample efficient off-policy inverse reinforcement learning via distribution matching. In International Conference on Robotics and Automation (ICRA), pages 448–454. IEEE, 2022.
  30. BC-IRL: Learning generalizable reward functions from demonstrations. In International Conference on Learning Representations (ICLR). Preprint retrieved from arXiv:2303.16194, 2023.
  31. Variational information maximisation for intrinsically motivated reinforcement learning. In Advances in Neural Information Processing Systems, volume 28, pages 2125–2133, 2015.
  32. State-only imitation with transition dynamics mismatch. In International Conference on Learning Representations (ICLR). Preprint retrieved from arXiv:2002.11879, 2020.
  33. Identifiability and generalizability from multiple experts in inverse reinforcement learning. In Advances in Neural Information Processing Systems, volume 35, pages 550–564, 2022.
  34. Identifiability and generalizability in constrained inverse reinforcement learning. In International Conference on Machine Learning, pages 30224–30251, 2023.
  35. f-irl: Inverse reinforcement learning via state marginal matching. In Conference on Robot Learning, pages 529–551. PMLR, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yangchun Zhang (6 papers)
  2. Yirui Zhou (7 papers)
  3. Qiang Liu (405 papers)
  4. Weiming Li (37 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com