BeTAIL: Behavior Transformer Adversarial Imitation Learning from Human Racing Gameplay (2402.14194v2)
Abstract: Imitation learning learns a policy from demonstrations without requiring hand-designed reward functions. In many robotic tasks, such as autonomous racing, imitated policies must model complex environment dynamics and human decision-making. Sequence modeling is highly effective in capturing intricate patterns of motion sequences but struggles to adapt to new environments or distribution shifts that are common in real-world robotics tasks. In contrast, Adversarial Imitation Learning (AIL) can mitigate this effect, but struggles with sample inefficiency and handling complex motion patterns. Thus, we propose BeTAIL: Behavior Transformer Adversarial Imitation Learning, which combines a Behavior Transformer (BeT) policy from human demonstrations with online AIL. BeTAIL adds an AIL residual policy to the BeT policy to model the sequential decision-making process of human experts and correct for out-of-distribution states or shifts in environment dynamics. We test BeTAIL on three challenges with expert-level demonstrations of real human gameplay in Gran Turismo Sport. Our proposed residual BeTAIL reduces environment interactions and improves racing performance and stability, even when the BeT is pretrained on different tracks than downstream learning. Videos and code available at: https://sites.google.com/berkeley.edu/BeTAIL/home.
- P. R. Wurman, S. Barrett, K. Kawamoto, J. MacGlashan, K. Subramanian, T. J. Walsh, R. Capobianco, A. Devlic, F. Eckert, F. Fuchs, et al., “Outracing champion gran turismo drivers with deep reinforcement learning,” Nature, vol. 602, no. 7896, pp. 223–228, 2022.
- J. Betz, H. Zheng, A. Liniger, U. Rosolia, P. Karle, M. Behl, V. Krovi, and R. Mangharam, “Autonomous vehicles on the edge: A survey on autonomous vehicle racing,” Open Journal Intell. Trans. Syst., vol. 3, pp. 458–488, 2022.
- S. Booth, W. B. Knox, J. Shah, S. Niekum, P. Stone, and A. Allievi, “The perils of trial-and-error reward design: Misdesign through overfitting and invalid task specifications,” Proc. AAAI Conf. Artificial Intell., vol. 37, no. 5, pp. 5920–5929, Jun. 2023. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/25733
- M. Zare, P. M. Kebria, A. Khosravi, and S. Nahavandi, “A survey of imitation learning: Algorithms, recent developments, and challenges,” arXiv preprint arXiv:2309.02473, 2023.
- E. L. Zhu, F. L. Busch, J. Johnson, and F. Borrelli, “A gaussian process model for opponent prediction in autonomous racing,” in Int. Conf. Intell. Robots Systems (IROS). IEEE, 2023, pp. 8186–8191.
- M. Orsini, A. Raichuk, L. Hussenot, D. Vincent, R. Dadashi, S. Girgin, M. Geist, O. Bachem, O. Pietquin, and M. Andrychowicz, “What matters for adversarial imitation learning?” Adv. Neural Inform. Processing Syst., vol. 34, pp. 14 656–14 668, 2021.
- L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas, and I. Mordatch, “Decision transformer: Reinforcement learning via sequence modeling,” Adv. Neural Inform. Processing Syst., vol. 34, pp. 15 084–15 097, 2021.
- M. Janner, Q. Li, and S. Levine, “Offline reinforcement learning as one big sequence modeling problem,” Adv. Neural Inform. Processing Syst., vol. 34, pp. 1273–1286, 2021.
- A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al., “Improving language understanding by generative pre-training,” 2018.
- J. Li, X. Liu, B. Zhu, J. Jiao, M. Tomizuka, C. Tang, and W. Zhan, “Guided online distillation: Promoting safe reinforcement learning by offline demonstration,” arXiv preprint arXiv:2309.09408, 2023.
- N. M. Shafiullah, Z. Cui, A. A. Altanzaya, and L. Pinto, “Behavior transformers: Cloning k𝑘kitalic_k modes with one stone,” Adv. Neural Inform. Processing Syst., vol. 35, pp. 22 955–22 968, 2022.
- Q. Zheng, A. Zhang, and A. Grover, “Online decision transformer,” in Int. Conf. Machine Learning. PMLR, 2022, pp. 27 042–27 059.
- J. Ho and S. Ermon, “Generative adversarial imitation learning,” Adv. Neural Inform. Processing Syst., vol. 29, 2016.
- T. Silver, K. Allen, J. Tenenbaum, and L. Kaelbling, “Residual policy learning,” arXiv preprint arXiv:1812.06298, 2018.
- K. Brown, K. Driggs-Campbell, and M. J. Kochenderfer, “A taxonomy and review of algorithms for modeling and predicting human driver behavior,” arXiv preprint arXiv:2006.08832, 2020.
- A. Kuefler, J. Morton, T. Wheeler, and M. Kochenderfer, “Imitating driver behavior with generative adversarial networks,” in Intell. Vehicles Sym. (IV). IEEE, 2017, pp. 204–211.
- Y. Li, J. Song, and S. Ermon, “Infogail: Interpretable imitation learning from visual demonstrations,” Adv. Neural Inform. Processing Syst., vol. 30, 2017.
- R. Bhattacharyya, B. Wulfe, D. J. Phillips, A. Kuefler, J. Morton, R. Senanayake, and M. J. Kochenderfer, “Modeling human driving behavior through generative adversarial imitation learning,” IEEE Trans. Intell. Transp. Syst., vol. 24, no. 3, pp. 2874–2887, 2022.
- T. Fernando, S. Denman, S. Sridharan, and C. Fookes, “Learning temporal strategic relationships using generative adversarial imitation learning,” arXiv preprint arXiv:1805.04969, 2018.
- A. Sharma, M. Sharma, N. Rhinehart, and K. M. Kitani, “Directed-info gail: Learning hierarchical policies from unsegmented demonstrations using directed information,” arXiv preprint arXiv:1810.01266, 2018.
- G. Lee, D. Kim, W. Oh, K. Lee, and S. Oh, “Mixgail: Autonomous driving using demonstrations with mixed qualities,” in IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS). IEEE, 2020, pp. 5425–5430.
- Y. Song, H. Lin, E. Kaufmann, P. Dürr, and D. Scaramuzza, “Autonomous overtaking in gran turismo sport using curriculum reinforcement learning,” in Int. Conf. Robot. Autom. (ICRA). IEEE, 2021, pp. 9403–9409.
- V. Bajaj, G. Sharon, and P. Stone, “Task phasing: Automated curriculum learning from demonstrations,” in Int. Conf. Automated Planning Scheduling, vol. 33, no. 1, 2023, pp. 542–550.
- Z. Xue, Z. Peng, Q. Li, Z. Liu, and B. Zhou, “Guarded policy optimization with imperfect online demonstrations,” in Int. Conf. Learning Representations, 2022.
- X.-H. Liu, F. Xu, X. Zhang, T. Liu, S. Jiang, R. Chen, Z. Zhang, and Y. Yu, “How to guide your learner: Imitation learning with active adaptive expert involvement,” in Pro. Int. Conf. Autonomous Agents Multiagent Syst., 2023, pp. 1276–1284.
- S. Levine and V. Koltun, “Guided policy search,” in Pro. Int. Conf. Machine Learning, ser. Proceedings of Machine Learning Research, S. Dasgupta and D. McAllester, Eds., vol. 28, no. 3. Atlanta, Georgia, USA: PMLR, 17–19 Jun 2013, pp. 1–9. [Online]. Available: https://proceedings.mlr.press/v28/levine13.html
- R. Zhang, J. Hou, G. Chen, Z. Li, J. Chen, and A. Knoll, “Residual policy learning facilitates efficient model-free autonomous racing,” IEEE Robot. Autom. Letters, vol. 7, no. 4, pp. 11 625–11 632, 2022.
- T. Johannink, S. Bahl, A. Nair, J. Luo, A. Kumar, M. Loskyll, J. A. Ojea, E. Solowjow, and S. Levine, “Residual reinforcement learning for robot control,” in 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 6023–6029.
- K. Rana, M. Xu, B. Tidd, M. Milford, and N. Sünderhauf, “Residual skill policies: Learning an adaptable skill-based action space for reinforcement learning for robotics,” in Conf. Robot Learning. PMLR, 2023, pp. 2095–2104.
- J. Won, D. Gopinath, and J. Hodgins, “Physics-based character controllers using conditional vaes,” Trans. Graphics (TOG), vol. 41, no. 4, pp. 1–12, 2022.
- C. Gao, C. Wu, M. Cao, R. Kong, Z. Zhang, and Y. Yu, “Act: Empowering decision transformer with dynamic programming via advantage conditioning,” arXiv preprint arXiv:2309.05915, 2023.
- A. Mandlekar, D. Xu, J. Wong, S. Nasiriany, C. Wang, R. Kulkarni, L. Fei-Fei, S. Savarese, Y. Zhu, and R. Martín-Martín, “What matters in learning from offline human demonstrations for robot manipulation,” arXiv preprint arXiv:2108.03298, 2021.
- T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., “Language models are few-shot learners,” Adv. Neural Inform. Processing Syst., vol. 33, pp. 1877–1901, 2020.
- R. Trumpp, D. Hoornaert, and M. Caccamo, “Residual policy learning for vehicle control of autonomous racing cars,” arXiv preprint arXiv:2302.07035, 2023.
- I. Kostrikov, K. K. Agrawal, D. Dwibedi, S. Levine, and J. Tompson, “Discriminator-actor-critic: Addressing sample inefficiency and reward bias in adversarial imitation learning,” arXiv preprint arXiv:1809.02925, 2018.
- F. Fuchs, Y. Song, E. Kaufmann, D. Scaramuzza, and P. Dürr, “Super-human performance in gran turismo sport using deep reinforcement learning,” Robot. Autom. Letters, vol. 6, no. 3, pp. 4257–4264, 2021.
- T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in Int. Conf. Machine Learning. PMLR, 2018, pp. 1861–1870.