Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Inverse Reinforcement Learning by Estimating Expertise of Demonstrators (2402.01886v2)

Published 2 Feb 2024 in cs.LG and cs.AI

Abstract: In Imitation Learning (IL), utilizing suboptimal and heterogeneous demonstrations presents a substantial challenge due to the varied nature of real-world data. However, standard IL algorithms consider these datasets as homogeneous, thereby inheriting the deficiencies of suboptimal demonstrators. Previous approaches to this issue rely on impractical assumptions like high-quality data subsets, confidence rankings, or explicit environmental knowledge. This paper introduces IRLEED, Inverse Reinforcement Learning by Estimating Expertise of Demonstrators, a novel framework that overcomes these hurdles without prior knowledge of demonstrator expertise. IRLEED enhances existing Inverse Reinforcement Learning (IRL) algorithms by combining a general model for demonstrator suboptimality to address reward bias and action variance, with a Maximum Entropy IRL framework to efficiently derive the optimal policy from diverse, suboptimal demonstrations. Experiments in both online and offline IL settings, with simulated and human-generated data, demonstrate IRLEED's adaptability and effectiveness, making it a versatile solution for learning from suboptimal demonstrations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning, pp.  1, 2004.
  2. Large language models associate muslims with violence. Nature Machine Intelligence, 3(6):461–463, 2021.
  3. A survey of robot learning from demonstration. Robotics and autonomous systems, 57(5):469–483, 2009.
  4. Action understanding as inverse planning. Cognition, 113(3):329–349, 2009.
  5. Imitation learning by estimating expertise of demonstrators. In International Conference on Machine Learning, pp.  1732–1748. PMLR, 2022.
  6. Data quality in imitation learning. arXiv preprint arXiv:2306.02437, 2023.
  7. Infinite time horizon maximum causal entropy inverse reinforcement learning. In 53rd IEEE Conference on Decision and Control, pp.  4911–4916, 2014. doi: 10.1109/CDC.2014.7040156.
  8. Openai gym, 2016.
  9. Better-than-demonstrator imitation learning via automatically-ranked demonstrations. In Conference on robot learning, pp.  330–359. PMLR, 2020.
  10. Learning from imperfect demonstrations from agents with varying dynamics. IEEE Robotics and Automation Letters, 6(3):5231–5238, 2021.
  11. Joint goal and strategy inference across heterogeneous demonstrators via reward network distillation. In Proceedings of the 2020 ACM/IEEE international conference on human-robot interaction, pp.  659–668, 2020.
  12. Learning from suboptimal demonstration via self-supervised reward regression. In Conference on robot learning, pp.  1262–1277. PMLR, 2021.
  13. Fast lifelong adaptive inverse reinforcement learning from demonstrations. In Conference on Robot Learning, pp.  2083–2094. PMLR, 2023.
  14. Diversity is all you need: Learning skills without a reward function. In International Conference on Learning Representations, 2018.
  15. Learning robust rewards with adverserial inverse reinforcement learning. In International Conference on Learning Representations, 2018.
  16. Reinforcement learning from imperfect demonstrations. arXiv preprint arXiv:1802.05313, 2018.
  17. Iq-learn: Inverse soft-q learning for imitation. Advances in Neural Information Processing Systems, 34:4028–4039, 2021.
  18. Reinforcement learning with deep energy-based policies. In International conference on machine learning, pp.  1352–1361. PMLR, 2017.
  19. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pp.  1861–1870. PMLR, 2018.
  20. Inverse reward design. Advances in neural information processing systems, 30, 2017.
  21. Generative adversarial imitation learning. Advances in neural information processing systems, 29, 2016.
  22. Imitation learning via off-policy distribution matching. In International Conference on Learning Representations, 2019.
  23. Learning to discern: Imitating heterogeneous human demonstrations with preference and representation learning. In Tan, J., Toussaint, M., and Darvish, K. (eds.), Proceedings of The 7th Conference on Robot Learning, volume 229 of Proceedings of Machine Learning Research, pp.  1437–1449. PMLR, 06–09 Nov 2023. URL https://proceedings.mlr.press/v229/kuhar23a.html.
  24. The atari grand challenge dataset. arXiv preprint arXiv:1705.10998, 2017.
  25. The boltzmann policy distribution: Accounting for systematic suboptimality in human models. In International Conference on Learning Representations, 2021.
  26. Luce, R. D. Individual choice behavior. 1959.
  27. What matters in learning from offline human demonstrations for robot manipulation. In Conference on robot learning, 2021.
  28. Algorithms for inverse reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning, pp.  663–670, 2000.
  29. Pomerleau, D. A. Efficient training of artificial neural networks for autonomous navigation. Neural computation, 3(1):88–97, 1991.
  30. Sqil: Imitation learning via reinforcement learning with sparse rewards. In International Conference on Learning Representations, 2019.
  31. Efficient reductions for imitation learning. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp.  661–668. JMLR Workshop and Conference Proceedings, 2010.
  32. Imitation learning from imperfect demonstration. In International Conference on Machine Learning, pp.  6818–6827. PMLR, 2019.
  33. Discriminator-weighted offline imitation learning from suboptimal demonstrations. In International Conference on Machine Learning, pp.  24725–24742. PMLR, 2022.
  34. Trail: Near-optimal imitation learning with suboptimal data. In International Conference on Learning Representations, 2021.
  35. Confidence-aware imitation learning from demonstrations with varying optimality. Advances in Neural Information Processing Systems, 34:12340–12350, 2021.
  36. Ziebart, B. D. Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University, 2010.
  37. Maximum entropy inverse reinforcement learning. In Proceedings of the 23rd national conference on Artificial intelligence-Volume 3, pp.  1433–1438, 2008.
  38. Modeling interaction via the principle of maximum causal entropy. 2010.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Mark Beliaev (8 papers)
  2. Ramtin Pedarsani (82 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets