Curricular Subgoals for Inverse Reinforcement Learning (2306.08232v1)
Abstract: Inverse Reinforcement Learning (IRL) aims to reconstruct the reward function from expert demonstrations to facilitate policy learning, and has demonstrated its remarkable success in imitation learning. To promote expert-like behavior, existing IRL methods mainly focus on learning global reward functions to minimize the trajectory difference between the imitator and the expert. However, these global designs are still limited by the redundant noise and error propagation problems, leading to the unsuitable reward assignment and thus downgrading the agent capability in complex multi-stage tasks. In this paper, we propose a novel Curricular Subgoal-based Inverse Reinforcement Learning (CSIRL) framework, that explicitly disentangles one task with several local subgoals to guide agent imitation. Specifically, CSIRL firstly introduces decision uncertainty of the trained agent over expert trajectories to dynamically select subgoals, which directly determines the exploration boundary of different task stages. To further acquire local reward functions for each stage, we customize a meta-imitation objective based on these curricular subgoals to train an intrinsic reward generator. Experiments on the D4RL and autonomous driving benchmarks demonstrate that the proposed methods yields results superior to the state-of-the-art counterparts, as well as better interpretability. Our code is available at https://github.com/Plankson/CSIRL.
- Hindsight experience replay. In Annual Conference on Neural Information Processing Systems, 2017.
- Szilárd Aradi. Survey of deep reinforcement learning for motion planning of autonomous vehicles. IEEE Transactions on Intelligent Transportation Systems, 23(2):740–759, 2020.
- Curriculum learning. In International Conference on Machine Learning, 2009.
- Modeling human driving behavior through generative adversarial imitation learning. IEEE Transactions on Intelligent Transportation Systems, 2022.
- Alex J Chan and Mihaela van der Schaar. Scalable bayesian inverse reinforcement learning. arXiv preprint arXiv:2102.06483, 2021.
- Goal-conditioned reinforcement learning with imagined subgoals. In International Conference on Machine Learning, 2021.
- Causal confusion in imitation learning. In Annual Conference on Neural Information Processing Systems, 2019.
- Goal-conditioned imitation learning. In Annual Conference on Neural Information Processing Systems, 2019.
- One-shot imitation learning. In Annual Conference on Neural Information Processing Systems, 2017.
- Dynamic conditional imitation learning for autonomous driving. IEEE Transactions on Intelligent Transportation Systems, 23(12):22988–23001, 2022.
- Curriculum-guided hindsight experience replay. In Annual Conference on Neural Information Processing Systems, 2019.
- Automatic goal generation for reinforcement learning agents. In International Conference on Machine Learning, 2018.
- Learning robust rewards with adversarial inverse reinforcement learning. arXiv preprint arXiv:1710.11248, 2017.
- D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
- Iq-learn: Inverse soft-q learning for imitation. In Annual Conference on Neural Information Processing Systems, 2021.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, 2018.
- Generative adversarial imitation learning. In Annual Conference on Neural Information Processing Systems, 2016.
- Optimistic planning of deterministic systems. In European Workshop on Reinforcement Learning, 2008.
- Mohamed Khalil Jabri. Robot manipulation learning using generative adversarial imitation learning. In International Joint Conference on Artificial Intelligence, 2021.
- Strictly batch imitation learning by energy-based distribution matching. In Annual Conference on Neural Information Processing Systems, 2020.
- Regularized inverse reinforcement learning. In International Conference on Learning Representations, 2020.
- Leslie Pack Kaelbling. Learning to achieve goals. In International Joint Conference on Artificial Intelligence, 1993.
- Discriminator-actor-critic: Addressing sample inefficiency and reward bias in adversarial imitation learning. In International Conference on Learning Representations, 2018.
- Imitation learning via off-policy distribution matching. In International Conference on Learning Representations, 2019.
- Edouard Leurent. An environment for autonomous driving decision-making. https://github.com/eleurent/highway-env, 2018.
- Nonlinear inverse reinforcement learning with gaussian processes. Annual Conference on Neural Information Processing Systems, 24, 2011.
- Anomaly detection and correction of optimizing autonomous systems with inverse reinforcement learning. IEEE Transactions on Cybernetics, 2022.
- Active exploration for inverse reinforcement learning. In Annual Conference on Neural Information Processing Systems, 2022.
- Learn goal-conditioned policy with intrinsic motivation for deep reinforcement learning. In AAAI Conference on Artificial Intelligence, 2022.
- Zero-shot reward specification via grounded natural language. In International Conference on Learning Representations, 2022.
- The driving school system: Learning basic driving skills from a teacher in a real car. IEEE Transactions on Intelligent Transportation Systems, 12(4):1135–1146, 2011.
- Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- Policy invariance under reward transformations: Theory and application to reward shaping. In International Conference on Machine Learning, 1999.
- Algorithms for inverse reinforcement learning. In International Conference on Machine Learning, 2000.
- Learning to select goals in automated planning with deep-q learning. Expert Systems with Applications, 202:117265, 2022.
- What matters for adversarial imitation learning? In Advances in Neural Information Processing Systems, 2021.
- Learning from trajectories via subgoal discovery. In Annual Conference on Neural Information Processing Systems, 2019.
- Dean A Pomerleau. Efficient training of artificial neural networks for autonomous navigation. Neural Computation, 3(1):88–97, 1991.
- Temporal difference models: Model-free deep rl for model-based control. arXiv preprint arXiv:1802.09081, 2018.
- Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
- Automated curricula through setter-solver interactions. arXiv preprint arXiv:1909.12892, 2019.
- Toward the fundamental limits of imitation learning. In Annual Conference on Neural Information Processing Systems, 2020.
- Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations. In Proceedings of Robotics: Science and Systems, 2018.
- Sqil: Imitation learning via reinforcement learning with sparse rewards. arXiv preprint arXiv:1905.11108, 2019.
- Efficient reductions for imitation learning. In Artificial Intelligence and Statistics, 2010.
- A reduction of imitation learning and structured prediction to no-regret online learning. In Artificial Intelligence and Statistics, 2011.
- Universal value function approximators. In International Conference on Machine Learning, 2015.
- Inverse reinforcement learning under noisy observations. arXiv preprint arXiv:1710.10116, 2017.
- Inverse reinforcement learning via nonparametric spatio-temporal subgoal modeling. Journal of Machine Learning Research, 19(69):1–45, 2018.
- Learning rewards from linguistic feedback. In AAAI Conference on Artificial Intelligence, 2021.
- Reinforcement learning with a disentangled universal value function for item recommendation. In AAAI Conference on Artificial Intelligence, 2021.
- Establishing style-oriented driver models by imitating human driving behaviors. IEEE Transactions on Intelligent Transportation Systems, 16(5):2522–2530, 2015.
- Multi-agent adversarial inverse reinforcement learning. In International Conference on Machine Learning, 2019.
- Xirl: Cross-embodiment inverse reinforcement learning. In Conference on Robot Learning, 2022.
- Objective-aware traffic simulation via inverse reinforcement learning. arXiv preprint arXiv:2105.09560, 2021.
- Li Zhou and Kevin Small. Inverse reinforcement learning with natural language goals. In AAAI Conference on Artificial Intelligence, 2021.
- Maximum entropy inverse reinforcement learning. In AAAI Conference on Artificial Intelligence, 2008.
- Modeling interaction via the principle of maximum causal entropy. In International Conference on Machine Learning, 2010.
- Shunyu Liu (48 papers)
- Yunpeng Qing (7 papers)
- Shuqi Xu (11 papers)
- Hongyan Wu (24 papers)
- Jiangtao Zhang (8 papers)
- Jingyuan Cong (2 papers)
- Tianhao Chen (14 papers)
- Yunfu Liu (1 paper)
- Mingli Song (163 papers)