Inverse Concave-Utility Reinforcement Learning is Inverse Game Theory (2405.19024v3)
Abstract: We consider inverse reinforcement learning problems with concave utilities. Concave Utility Reinforcement Learning (CURL) is a generalisation of the standard RL objective, which employs a concave function of the state occupancy measure, rather than a linear function. CURL has garnered recent attention for its ability to represent instances of many important applications including the standard RL such as imitation learning, pure exploration, constrained MDPs, offline RL, human-regularized RL, and others. Inverse reinforcement learning is a powerful paradigm that focuses on recovering an unknown reward function that can rationalize the observed behaviour of an agent. There has been recent theoretical advances in inverse RL where the problem is formulated as identifying the set of feasible reward functions. However, inverse RL for CURL problems has not been considered previously. In this paper we show that most of the standard IRL results do not apply to CURL in general, since CURL invalidates the classical BeLLMan equations. This calls for a new theoretical framework for the inverse CURL problem. Using a recent equivalence result between CURL and Mean-field Games, we propose a new definition for the feasible rewards for I-CURL by proving that this problem is equivalent to an inverse game theory problem in a subclass of mean-field games. We outline future directions and applications in human--AI collaboration enabled by our results.
- Variational policy gradient method for reinforcement learning with general utilities. Advances in Neural Information Processing Systems, 33:4572–4583, 2020a.
- Concave utility reinforcement learning: The mean-field game viewpoint. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’22, page 489–497, Richland, SC, 2022. International Foundation for Autonomous Agents and Multiagent Systems. ISBN 9781450392136.
- Provably efficient maximum entropy exploration. In International Conference on Machine Learning, pages 2681–2691. PMLR, 2019.
- Concave utility reinforcement learning with zero-constraint violations. Transactions on Machine Learning Research, 2022.
- Convex reinforcement learning in finite trials. Journal of Machine Learning Research, 24(250):1–42, 2023.
- Reward is enough for convex mdps. Advances in Neural Information Processing Systems, 34:25746–25759, 2021.
- Pedro P. Santos. Generalizing objective-specification in markov decision processes. In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’24, page 2767–2769, Richland, SC, 2024. International Foundation for Autonomous Agents and Multiagent Systems. ISBN 9798400704864.
- Eitan Altman. Constrained Markov decision processes. Routledge, 2021.
- Vivek S Borkar. An actor-critic algorithm for constrained markov decision processes. Systems & control letters, 54(3):207–213, 2005.
- Reward constrained policy optimization. In International Conference on Learning Representations, 2018.
- Balancing constraints and rewards with meta-gradient d4pg. In International Conference on Learning Representations, 2020.
- Generative adversarial imitation learning. Advances in neural information processing systems, 29, 2016.
- Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274, 2019.
- A divergence minimization perspective on imitation learning methods. In Conference on robot learning, pages 1259–1277. PMLR, 2020.
- Policy gradient for coherent risk measures. Advances in neural information processing systems, 28, 2015.
- Risk-sensitive and robust decision-making: a cvar optimization approach. Advances in neural information processing systems, 28, 2015.
- Risk-constrained reinforcement learning with percentile risk criteria. Journal of Machine Learning Research, 18(167):1–51, 2018.
- Cautious reinforcement learning via distributional risk in the dual domain. IEEE Journal on Selected Areas in Information Theory, 2:611–626, 2020b. URL https://api.semanticscholar.org/CorpusID:211572706.
- Human-compatible driving partners through data-regularized self-play reinforcement learning. arXiv preprint arXiv:2403.19648, 2024.
- Modeling strong and human-like gameplay with kl-regularized search. In International Conference on Machine Learning, pages 9695–9728. PMLR, 2022.
- Algorithms for inverse reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 663–670, 2000.
- Herbert A Simon. Models of man: Social and rational. Wiley, 1957.
- Computational rationality: Linking mechanism and behavior through bounded utility maximization. Topics in cognitive science, 6(2):279–311, 2014.
- Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science, 349(6245):273–278, 2015.
- Occam’s razor is insufficient to infer the preferences of irrational agents. Advances in neural information processing systems, 31, 2018.
- Information processing and bounded rationality: A survey. Frontiers in Decision Neuroscience, 7:1–22, 2013a.
- Thermodynamics as a theory of decision-making with information-processing costs. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 469(2153):20120683, 2013b.
- Information-theoretic bounded rationality and ϵitalic-ϵ\epsilonitalic_ϵ-optimality. Entropy, 16(8):4662–4676, 2014. ISSN 1099-4300. doi: 10.3390/e16084662. URL https://www.mdpi.com/1099-4300/16/8/4662.
- The efficiency of human cognition reflects planned information processing. In Proceedings of the 34th aaai conference on artificial intelligence, 2020.
- Trading value and information in mdps. Decision making with imperfect decision makers, pages 57–74, 2012.
- Bounded rationality, abstraction, and hierarchical decision-making: An information-theoretic optimality principle. Frontiers in Robotics and AI, 2:27, 2015.
- Humans account for cognitive costs when finding shortcuts: An information-theoretic analysis of navigation. PLOS Computational Biology, 19(1):e1010829, 2023.
- Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
- Provably efficient learning of transferable rewards. In International Conference on Machine Learning, pages 7665–7676. PMLR, 2021.
- Towards theoretical understanding of inverse reinforcement learning. In International Conference on Machine Learning, pages 24555–24591. PMLR, 2023.
- Active exploration for inverse reinforcement learning. Advances in Neural Information Processing Systems, 35:5843–5853, 2022.
- Inverse game theory: Learning utilities in succinct games. In Web and Internet Economics: 11th International Conference, WINE 2015, Amsterdam, The Netherlands, December 9-12, 2015, Proceedings 11, pages 413–427. Springer, 2015.
- Computational rationalization: the inverse equilibrium problem. In Proceedings of the 28th International Conference on International Conference on Machine Learning, pages 1169–1176, 2011.
- Individual-level inverse reinforcement learning for mean field games. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, pages 253–262, 2022.
- Generative adversarial inverse multiagent learning. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=JzvIWvC9MG.
- Clément L. Canonne. A short note on learning discrete distributions. https://github.com/ccanonne/probabilitydistributiontoolbox/blob/master/learning.pdf, 2023. Accessed: 2024-05-19.
- Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. The Annals of Mathematical Statistics, pages 642–669, 1956.
- Bridging offline reinforcement learning and imitation learning: A tale of pessimism. Advances in Neural Information Processing Systems, 34:11702–11716, 2021.
- Constrained reinforcement learning with smoothed log barrier function. arXiv preprint arXiv:2403.14508, 2024.
- Inverse decision modeling: Learning interpretable representations of behavior. In International Conference on Machine Learning, pages 4755–4771. PMLR, 2021.
- Learning the preferences of ignorant, inconsistent agents. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30, 2016.
- Risk-sensitive inverse reinforcement learning via coherent risk models. In Robotics: science and systems, volume 16, page 117, 2017.
- Human irrationality: both bad and good for reward inference. arXiv preprint arXiv:2111.06956, 2021.
- Risk-sensitive inverse reinforcement learning via semi-and non-parametric methods. The International Journal of Robotics Research, 37(13-14):1713–1740, 2018.
- On the feasibility of learning, rather than assuming, human biases for reward inference. In International Conference on Machine Learning, pages 5670–5679. PMLR, 2019.
- Cooperative inverse reinforcement learning. Advances in neural information processing systems, 29, 2016.
- An efficient, generalized bellman update for cooperative inverse reinforcement learning. In International Conference on Machine Learning, pages 3394–3402. PMLR, 2018.
- Moral: Aligning ai with human norms through multi-objective reinforced active learning. In AAMAS 2022: 21st International Conference on Autonomous Agents and Multiagent Systems (Virtual), pages 1038–1046. International Foundation for Autonomous Agents and Multiagent Systems, 2022.
- Learning deep mean field games for modeling large population behavior. In International Conference on Learning Representations, 2018.
- Adversarial inverse reinforcement learning for mean field games. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, pages 1088–1096, 2023.
- Multi-agent inverse reinforcement learning. In 2010 ninth international conference on machine learning and applications, pages 395–400. IEEE, 2010.
- Cognitive science as a source of forward and inverse models of human decisions for robotics and control. Annual Review of Control, Robotics, and Autonomous Systems, 5:33–53, 2022.
- Modeling needs user modeling. Frontiers in Artificial Intelligence, 6:1097891, 2023.
- Best-response bayesian reinforcement learning with bayes-adaptive pomdps for centaurs. In International Conference on Autonomous Agents and Multiagent Systems, pages 235–243. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS), 2022.
- Mustafa Mert Çelikok (12 papers)
- Frans A. Oliehoek (56 papers)
- Jan-Willem van de Meent (57 papers)