Inverse Reinforcement Learning with Unknown Reward Model based on Structural Risk Minimization (2312.16566v1)
Abstract: Inverse reinforcement learning (IRL) usually assumes the model of the reward function is pre-specified and estimates the parameter only. However, how to determine a proper reward model is nontrivial. A simplistic model is less likely to contain the real reward function, while a model with high complexity leads to substantial computation cost and risks overfitting. This paper addresses this trade-off in IRL model selection by introducing the structural risk minimization (SRM) method from statistical learning. SRM selects an optimal reward function class from a hypothesis set minimizing both estimation error and model complexity. To formulate an SRM scheme for IRL, we estimate policy gradient by demonstration serving as empirical risk and establish the upper bound of Rademacher complexity of hypothesis classes as model penalty. The learning guarantee is further presented. In particular, we provide explicit SRM for the common linear weighted sum setting in IRL. Simulations demonstrate the performance and efficiency of our scheme.
- Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning, page 1, 2004.
- A survey of inverse reinforcement learning. Artificial Intelligence Review, 55(6):4307–4346, 2022.
- A survey of inverse reinforcement learning: Challenges, methods and progress. Artificial Intelligence, 297:103500, 2021.
- Dynamic inverse reinforcement learning for characterizing animal behavior. Advances in Neural Information Processing Systems, 35:29663–29676, 2022.
- Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov):463–482, 2002.
- Infinite-horizon policy-gradient estimation. journal of artificial intelligence research, 15:319–350, 2001.
- A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6):1291–1307, 2012.
- Generative adversarial imitation learning. Advances in neural information processing systems, 29, 2016.
- Construction of a 3d object recognition and manipulation database from grasp demonstrations. Autonomous Robots, 40:175–192, 2016.
- Vladimir Koltchinskii. Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory, 47(5):1902–1914, 2001.
- Learning driving styles for autonomous vehicles from demonstration. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 2641–2646. IEEE, 2015.
- Probabilistic movement primitives for coordination of multiple human–robot collaborative tasks. Autonomous Robots, 41:593–612, 2017.
- Structural risk minimization for switched system identification. In 2020 59th IEEE Conference on Decision and Control (CDC), pages 1002–1007. IEEE, 2020.
- Foundations of machine learning. MIT Press, 2nd edition, 2018.
- Natural actor-critic. Neurocomputing, 71(7-9):1180–1190, 2008.
- Inverse reinforcement learning through policy gradient minimization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30, 2016.
- Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 3758–3765. IEEE, 2018.
- Bayesian inverse reinforcement learning. In IJCAI, volume 7, pages 2586–2591, 2007.
- Recent advances in robot learning from demonstration. Annual Review of Control, Robotics, and Autonomous Systems, 3:297–330, 2020.
- Structural risk minimization over data-dependent hierarchies. IEEE transactions on Information Theory, 44(5):1926–1940, 1998.
- Structural risk minimization for learning nonlinear dynamics. arXiv preprint arXiv:2309.16527, 2023.
- Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999.
- Behavioral cloning from observation. arXiv preprint arXiv:1805.01954, 2018.
- Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992.
- Modeling interaction via the principle of maximum causal entropy. 2010.
- Chendi Qu (7 papers)
- Jianping He (56 papers)
- Xiaoming Duan (30 papers)
- Jiming Chen (105 papers)