Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Quantifying the Sensitivity of Inverse Reinforcement Learning to Misspecification (2403.06854v1)

Published 11 Mar 2024 in cs.LG

Abstract: Inverse reinforcement learning (IRL) aims to infer an agent's preferences (represented as a reward function $R$) from their behaviour (represented as a policy $\pi$). To do this, we need a behavioural model of how $\pi$ relates to $R$. In the current literature, the most common behavioural models are optimality, Boltzmann-rationality, and causal entropy maximisation. However, the true relationship between a human's preferences and their behaviour is much more complex than any of these behavioural models. This means that the behavioural models are misspecified, which raises the concern that they may lead to systematic errors if applied to real data. In this paper, we analyse how sensitive the IRL problem is to misspecification of the behavioural model. Specifically, we provide necessary and sufficient conditions that completely characterise how the observed data may differ from the assumed behavioural model without incurring an error above a given threshold. In addition to this, we also characterise the conditions under which a behavioural model is robust to small perturbations of the observed policy, and we analyse how robust many behavioural models are to misspecification of their parameter values (such as e.g.\ the discount rate). Our analysis suggests that the IRL problem is highly sensitive to misspecification, in the sense that very mild misspecification can lead to very large errors in the inferred reward function.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Occam’s razor is insufficient to infer the preferences of irrational agents. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, volume 31, pp.  5603–5614, Montréal, Canada, 2018. Curran Associates, Inc., Red Hook, NY, USA.
  2. Identifiability in inverse reinforcement learning. arXiv preprint, arXiv:2106.03498 [cs.LG], 2021.
  3. Inverse optimal control with linearly-solvable MDPs. In Proceedings of the 27th International Conference on Machine Learning, pp.  335–342, Haifa, Israel, June 2010. Omnipress, Madison, Wisconsin, USA.
  4. Choice set misspecification in reward inference. In IJCAI-PRICAI-20 Workshop on Artificial Intelligence Safety, 2020. doi: 10.48550/ARXIV.2101.07691. URL https://arxiv.org/abs/2101.07691.
  5. Quantifying differences in reward functions. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=LwEQnp6CYev.
  6. All’s well that ends well: Avoiding side effects with distance-impact penalties. In NeurIPS ML Safety Workshop, 2022. URL https://openreview.net/forum?id=3tgegVVh2j6.
  7. Cooperative inverse reinforcement learning. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. URL https://proceedings.neurips.cc/paper/2016/file/c3395dd46c34fa7fd8d729d8cf88b7a8-Paper.pdf.
  8. On the sensitivity of reward inference to misspecified human models, 2022.
  9. Imitation learning: A survey of learning methods. ACM Comput. Surv., 50(2), 4 2017. ISSN 0360-0300. doi: 10.1145/3054912. URL https://doi.org/10.1145/3054912.
  10. Reward learning from human preferences and demonstrations in Atari. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, volume 31, pp.  8022–8034, Montréal, Canada, 2018. Curran Associates, Inc., Red Hook, NY, USA.
  11. Preprocessing reward functions for interpretability, 2022.
  12. Reward identification in inverse reinforcement learning. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp.  5496–5505, Virtual, July 2021. PMLR.
  13. Penalizing side effects using stepwise relative reachability, 2018. URL https://arxiv.org/abs/1806.01186.
  14. Avoiding side effects by considering future tasks, 2020. URL https://arxiv.org/abs/2010.07877.
  15. Understanding learned reward functions, 2020.
  16. Algorithms for inverse reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning, volume 1, pp.  663–670, Stanford, California, USA, 2000. Morgan Kaufmann Publishers Inc.
  17. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the Sixteenth International Conference on Machine Learning, pp.  278–287, Bled, Slovenia, 1999. Morgan Kaufmann Publishers Inc.
  18. What matters for adversarial imitation learning? arXiv preprint, arXiv:2106.00672 [cs.LG], 2021. To appear in Proceedings of the 35th International Conference on Neural Information Processing Systems, 2021.
  19. Estimating individual rates of discount: a meta-analysis. Applied Economics Letters, 16:1235 – 1239, 2009. URL https://api.semanticscholar.org/CorpusID:154901975.
  20. Bayesian inverse reinforcement learning. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, pp.  2586–2591, Hyderabad, India, 2007. Morgan Kaufmann Publishers Inc.
  21. Identifiability and generalizability in constrained inverse reinforcement learning, 2023.
  22. Misspecification in inverse reinforcement learning, 2023.
  23. Invariance in policy optimisation and partial identifiability in reward learning. arXiv preprint arXiv:2203.07475, 2022.
  24. Starc: A general framework for quantifying differences between reward functions, 2023.
  25. Avoiding side effects in complex environments. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  21406–21415. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/f50a6c02a3fc5a3a5d4d9391f05f3efc-Paper.pdf.
  26. Robust inverse reinforcement learning under transition dynamics mismatch. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=t8HduwpoQQv.
  27. Dynamics-aware comparison of learned reward functions. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=CALFyKVs87.
  28. Brian D Ziebart. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy. PhD thesis, Carnegie Mellon University, 2010.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Joar Skalse (17 papers)
  2. Alessandro Abate (137 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.