Hierarchical Generative Adversarial Imitation Learning with Mid-level Input Generation for Autonomous Driving on Urban Environments
Abstract: Deriving robust control policies for realistic urban navigation scenarios is not a trivial task. In an end-to-end approach, these policies must map high-dimensional images from the vehicle's cameras to low-level actions such as steering and throttle. While pure Reinforcement Learning (RL) approaches are based exclusively on engineered rewards, Generative Adversarial Imitation Learning (GAIL) agents learn from expert demonstrations while interacting with the environment, which favors GAIL on tasks for which a reward signal is difficult to derive, such as autonomous driving. However, training deep networks directly from raw images on RL tasks is known to be unstable and troublesome. To deal with that, this work proposes a hierarchical GAIL-based architecture (hGAIL) which decouples representation learning from the driving task to solve the autonomous navigation of a vehicle. The proposed architecture consists of two modules: a GAN (Generative Adversarial Net) which generates an abstract mid-level input representation, which is the Bird's-Eye View (BEV) from the surroundings of the vehicle; and the GAIL which learns to control the vehicle based on the BEV predictions from the GAN as input. hGAIL is able to learn both the policy and the mid-level representation simultaneously as the agent interacts with the environment. Our experiments made in the CARLA simulation environment have shown that GAIL exclusively from cameras (without BEV) fails to even learn the task, while hGAIL, after training exclusively on one city, was able to autonomously navigate successfully in 98% of the intersections of a new city not used in training phase. Videos and code available at: https://sites.google.com/view/hgail
- M. Montemerlo, J. Becker, S. Bhat, H. Dahlkamp, D. Dolgov, S. Ettinger, D. Haehnel, T. Hilden, G. Hoffmann, B. Huhnke, D. Johnston, S. Klumpp, D. Langer, A. Levandowski, J. Levinson, J. Marcil, D. Orenstein, J. Paefgen, I. Penny, and S. Thrun, “Junior: The stanford entry in the urban challenge,” Journal of Field Robotics, vol. 25, pp. 569 – 597, 09 2008.
- H. Zhu, K.-V. Yuen, L. Mihaylova, and H. Leung, “Overview of environment perception for intelligent vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 18, no. 10, pp. 2584–2601, 2017.
- B. Paden, M. Cáp, S. Z. Yong, D. S. Yershov, and E. Frazzoli, “A survey of motion planning and control techniques for self-driving urban vehicles,” IEEE Transactions on Intelligent Vehicles, vol. 1, pp. 33–55, 2016.
- L. M. S. Parizotto and E. A. Antonelo, “Cone detection with convolutional neural networks for an autonomous formula student race car,” in 26th ABCM International Congress of Mechanical Engineering (COBEM 2021), 2021.
- M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao, and K. Zieba, “End to end learning for self-driving cars,” 2016. [Online]. Available: https://arxiv.org/abs/1604.07316
- H. Xu, Y. Gao, F. Yu, and T. Darrell, “End-to-end learning of driving models from large-scale video datasets,” 2016. [Online]. Available: https://arxiv.org/abs/1612.01079
- M. Bansal, A. Krizhevsky, and A. Ogale, “Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst,” 2018. [Online]. Available: https://arxiv.org/abs/1812.03079
- F. Codevilla, M. Müller, A. López, V. Koltun, and A. Dosovitskiy, “End-to-end driving via conditional imitation learning,” 2018.
- F. Codevilla, E. Santana, A. M. López, and A. Gaidon, “Exploring the limitations of behavior cloning for autonomous driving,” 2019.
- S. Ross and D. Bagnell, “Efficient reductions for imitation learning.” Journal of Machine Learning Research - Proceedings Track, vol. 9, pp. 661–668, 01 2010.
- A. Prakash, A. Behl, E. Ohn-Bar, K. Chitta, and A. Geiger, “Exploring data aggregation in policy learning for vision-based urban autonomous driving,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11 760–11 770.
- S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, G. Gordon, D. Dunson, and M. Dudík, Eds., vol. 15. Fort Lauderdale, FL, USA: PMLR, 11–13 Apr 2011, pp. 627–635. [Online]. Available: https://proceedings.mlr.press/v15/ross11a.html
- A. Y. Ng, S. J. Russell et al., “Algorithms for inverse reinforcement learning.” in ICML, 2000, pp. 663–670.
- J. Ho and S. Ermon, “Generative adversarial imitation learning,” in Advances in Neural Information Processing Systems, 2016, pp. 4565–4573.
- G. C. Karl Couto and E. A. Antonelo, “Generative adversarial imitation learning for end-to-end autonomous driving on urban environments,” in 2021 IEEE Symposium Series on Computational Intelligence (SSCI), 2021, pp. 1–7.
- K. Ota, D. K. Jha, and A. Kanezaki, “Training larger networks for deep reinforcement learning,” arXiv preprint arXiv:2102.07920, 2021.
- Z. Zhang, A. Liniger, D. Dai, F. Yu, and L. Van Gool, “End-to-end urban driving by imitating a reinforcement learning coach,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 15 202–15 212.
- A. Sax, J. O. Zhang, B. Emi, A. Zamir, S. Savarese, L. Guibas, and J. Malik, “Learning to navigate using mid-level visual priors,” arXiv preprint arXiv:1912.11121, 2019.
- M. Müller, A. Dosovitskiy, B. Ghanem, and V. Koltun, “Driving policy transfer via modularity and abstraction,” arXiv preprint arXiv:1804.09364, 2018.
- A. Mousavian, A. Toshev, M. Fišer, J. Košecká, A. Wahid, and J. Davidson, “Visual representations for semantic target driven navigation,” in 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 8846–8852.
- W. Yang, X. Wang, A. Farhadi, A. Gupta, and R. Mottaghi, “Visual semantic navigation using scene priors,” arXiv preprint arXiv:1810.06543, 2018.
- P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5967–5976.
- R. Jena, C. Liu, and K. Sycara, “Augmenting gail with bc for sample efficient imitation learning,” arXiv preprint arXiv:2001.07798, 2020.
- S. Teng, L. Chen, Y. Ai, Y. Zhou, Z. Xuanyuan, and X. Hu, “Hierarchical interpretable imitation learning for end-to-end autonomous driving,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 1, pp. 673–683, 2023.
- Y. Wang, D. Zhang, J. Wang, Z. Chen, Y. Li, Y. Wang, and R. Xiong, “Imitation learning of hierarchical driving model: From continuous intention to continuous trajectory,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2477–2484, 2021.
- P. Cai, S. Wang, Y. Sun, and M. Liu, “Probabilistic end-to-end vehicle navigation in complex dynamic environments with multimodal sensor fusion,” IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 4218–4224, 2020.
- P. Cai, H. Wang, H. Huang, Y. Liu, and M. Liu, “Vision-based autonomous car racing using deep imitative reinforcement learning,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 7262–7269, 2021.
- A. Kuefler, J. Morton, T. Wheeler, and M. Kochenderfer, “Imitating driver behavior with generative adversarial networks,” in Intelligent Vehicles Symposium (IV), 2017 IEEE. IEEE, 2017, pp. 204–211.
- E. Bronstein, M. Palatucci, D. Notz, B. White, A. Kuefler, Y. Lu, S. Paul, P. Nikdel, P. Mougin, H. Chen et al., “Hierarchical model-based imitation learning for planning in autonomous driving,” arXiv preprint arXiv:2210.09539, 2022.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
- M. Bloem and N. Bambos, “Infinite time horizon maximum causal entropy inverse reinforcement learning,” in Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on. IEEE, 2014, pp. 4911–4916.
- I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved training of wasserstein gans,” 2017.
- M. Zhang, Y. Wang, X. Ma, L. Xia, J. Yang, Z. Li, and X. Li, “Wasserstein distance guided adversarial imitation learning with reward shape exploration,” 2020 IEEE 9th Data Driven Control and Learning Systems Conference (DDCLS), Nov 2020. [Online]. Available: http://dx.doi.org/10.1109/DDCLS49620.2020.9275169
- Y. Li, J. Song, and S. Ermon, “Infogail: Interpretable imitation learning from visual demonstrations,” 2017.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” CoRR, vol. abs/1707.06347, 2017. [Online]. Available: http://arxiv.org/abs/1707.06347
- O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241.
- I. G. Petrazzini and E. A. Antonelo, “Proximal policy optimization with continuous bounded action space via the beta distribution,” in 2021 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2021, pp. 1–8.
- D. Chen, B. Zhou, V. Koltun, and P. Krähenbühl, “Learning by cheating,” 2019.
- A. Stooke, K. Lee, P. Abbeel, and M. Laskin, “Decoupling representation learning from reinforcement learning,” in International Conference on Machine Learning. PMLR, 2021, pp. 9870–9879.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.