Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hierarchical Generative Adversarial Imitation Learning with Mid-level Input Generation for Autonomous Driving on Urban Environments (2302.04823v5)

Published 9 Feb 2023 in cs.LG, cs.AI, and cs.RO

Abstract: Deriving robust control policies for realistic urban navigation scenarios is not a trivial task. In an end-to-end approach, these policies must map high-dimensional images from the vehicle's cameras to low-level actions such as steering and throttle. While pure Reinforcement Learning (RL) approaches are based exclusively on engineered rewards, Generative Adversarial Imitation Learning (GAIL) agents learn from expert demonstrations while interacting with the environment, which favors GAIL on tasks for which a reward signal is difficult to derive, such as autonomous driving. However, training deep networks directly from raw images on RL tasks is known to be unstable and troublesome. To deal with that, this work proposes a hierarchical GAIL-based architecture (hGAIL) which decouples representation learning from the driving task to solve the autonomous navigation of a vehicle. The proposed architecture consists of two modules: a GAN (Generative Adversarial Net) which generates an abstract mid-level input representation, which is the Bird's-Eye View (BEV) from the surroundings of the vehicle; and the GAIL which learns to control the vehicle based on the BEV predictions from the GAN as input. hGAIL is able to learn both the policy and the mid-level representation simultaneously as the agent interacts with the environment. Our experiments made in the CARLA simulation environment have shown that GAIL exclusively from cameras (without BEV) fails to even learn the task, while hGAIL, after training exclusively on one city, was able to autonomously navigate successfully in 98% of the intersections of a new city not used in training phase. Videos and code available at: https://sites.google.com/view/hgail

Hierarchical Generative Adversarial Imitation Learning for Autonomous Driving

The paper discusses an innovative approach to developing robust control policies for autonomous vehicle navigation in urban environments. The authors propose a Hierarchical Generative Adversarial Imitation Learning (hGAIL) architecture designed to enhance the stability and efficacy of policy learning by integrating a Generative Adversarial Network (GAN) to produce an abstract mid-level input representation known as a Bird's-Eye View (BEV).

Core Contributions

The proposed hGAIL framework tackles the inherent complexities in training deep networks directly from high-dimensional camera images for Reinforcement Learning (RL) tasks, a process known for its instability. The approach consists of two main components:

  1. Mid-level Input Generation: A GAN operates as the first module, creating an abstract BEV representation from raw camera inputs. This addresses the instability typically associated with training RL agents directly from raw image data.
  2. Policy Learning: The second module, based on Generative Adversarial Imitation Learning (GAIL), uses the generated BEV representation to learn effective driving strategies. GAIL facilitates policy learning by leveraging expert demonstrations, alleviating the difficulty of defining an explicit reward function.

Empirical Evaluation

The authors conducted experiments using the CARLA simulation environment, assessing hGAIL's performance against baseline approaches. The paper demonstrated that learning policies solely from high-dimensional camera data without mid-level abstractions led to unsuccessful training outcomes. In contrast, the hGAIL agent achieved a high success rate, effectively navigating new cityscapes with a 98% success rate in intersection navigation, despite being trained in a different city. This highlights the utility of GAN-produced mid-level representations for stable and efficient policy learning.

Implications and Future Directions

This work contributes to the field by advancing the design of autonomous navigation systems with more stable and reliable training frameworks. The separation of representation learning from policy training allows the system to better generalize and adapt to novel environments. The integration of GANs in generating mid-level input representations suggests possibilities for enhancing real-world applications where direct environment mapping is infeasible.

Theoretically, the paper enriches the discourse on imitation learning frameworks for AD, providing an avenue for further exploration into the effective delineation of input processing and policy execution phases. Practically, this architecture can be extended to more dynamic scenarios involving other traffic participants and environmental conditions, enhancing the real-world robustness of autonomous vehicle systems.

In future developments, the approach to learning mid-level representations like BEV could facilitate sim-to-real transfer, thereby transitioning learned policies from simulated to real environments. This leverages the potential of hierarchical learning structures for efficient, scalable real-world deployment of autonomous driving technologies. The exploration of advanced scenario integration, such as dynamic obstacles, traffic signals, and diverse weather conditions, remains a promising and crucial endeavor for further research.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. M. Montemerlo, J. Becker, S. Bhat, H. Dahlkamp, D. Dolgov, S. Ettinger, D. Haehnel, T. Hilden, G. Hoffmann, B. Huhnke, D. Johnston, S. Klumpp, D. Langer, A. Levandowski, J. Levinson, J. Marcil, D. Orenstein, J. Paefgen, I. Penny, and S. Thrun, “Junior: The stanford entry in the urban challenge,” Journal of Field Robotics, vol. 25, pp. 569 – 597, 09 2008.
  2. H. Zhu, K.-V. Yuen, L. Mihaylova, and H. Leung, “Overview of environment perception for intelligent vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 18, no. 10, pp. 2584–2601, 2017.
  3. B. Paden, M. Cáp, S. Z. Yong, D. S. Yershov, and E. Frazzoli, “A survey of motion planning and control techniques for self-driving urban vehicles,” IEEE Transactions on Intelligent Vehicles, vol. 1, pp. 33–55, 2016.
  4. L. M. S. Parizotto and E. A. Antonelo, “Cone detection with convolutional neural networks for an autonomous formula student race car,” in 26th ABCM International Congress of Mechanical Engineering (COBEM 2021), 2021.
  5. M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao, and K. Zieba, “End to end learning for self-driving cars,” 2016. [Online]. Available: https://arxiv.org/abs/1604.07316
  6. H. Xu, Y. Gao, F. Yu, and T. Darrell, “End-to-end learning of driving models from large-scale video datasets,” 2016. [Online]. Available: https://arxiv.org/abs/1612.01079
  7. M. Bansal, A. Krizhevsky, and A. Ogale, “Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst,” 2018. [Online]. Available: https://arxiv.org/abs/1812.03079
  8. F. Codevilla, M. Müller, A. López, V. Koltun, and A. Dosovitskiy, “End-to-end driving via conditional imitation learning,” 2018.
  9. F. Codevilla, E. Santana, A. M. López, and A. Gaidon, “Exploring the limitations of behavior cloning for autonomous driving,” 2019.
  10. S. Ross and D. Bagnell, “Efficient reductions for imitation learning.” Journal of Machine Learning Research - Proceedings Track, vol. 9, pp. 661–668, 01 2010.
  11. A. Prakash, A. Behl, E. Ohn-Bar, K. Chitta, and A. Geiger, “Exploring data aggregation in policy learning for vision-based urban autonomous driving,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11 760–11 770.
  12. S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, G. Gordon, D. Dunson, and M. Dudík, Eds., vol. 15.   Fort Lauderdale, FL, USA: PMLR, 11–13 Apr 2011, pp. 627–635. [Online]. Available: https://proceedings.mlr.press/v15/ross11a.html
  13. A. Y. Ng, S. J. Russell et al., “Algorithms for inverse reinforcement learning.” in ICML, 2000, pp. 663–670.
  14. J. Ho and S. Ermon, “Generative adversarial imitation learning,” in Advances in Neural Information Processing Systems, 2016, pp. 4565–4573.
  15. G. C. Karl Couto and E. A. Antonelo, “Generative adversarial imitation learning for end-to-end autonomous driving on urban environments,” in 2021 IEEE Symposium Series on Computational Intelligence (SSCI), 2021, pp. 1–7.
  16. K. Ota, D. K. Jha, and A. Kanezaki, “Training larger networks for deep reinforcement learning,” arXiv preprint arXiv:2102.07920, 2021.
  17. Z. Zhang, A. Liniger, D. Dai, F. Yu, and L. Van Gool, “End-to-end urban driving by imitating a reinforcement learning coach,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 15 202–15 212.
  18. A. Sax, J. O. Zhang, B. Emi, A. Zamir, S. Savarese, L. Guibas, and J. Malik, “Learning to navigate using mid-level visual priors,” arXiv preprint arXiv:1912.11121, 2019.
  19. M. Müller, A. Dosovitskiy, B. Ghanem, and V. Koltun, “Driving policy transfer via modularity and abstraction,” arXiv preprint arXiv:1804.09364, 2018.
  20. A. Mousavian, A. Toshev, M. Fišer, J. Košecká, A. Wahid, and J. Davidson, “Visual representations for semantic target driven navigation,” in 2019 International Conference on Robotics and Automation (ICRA).   IEEE, 2019, pp. 8846–8852.
  21. W. Yang, X. Wang, A. Farhadi, A. Gupta, and R. Mottaghi, “Visual semantic navigation using scene priors,” arXiv preprint arXiv:1810.06543, 2018.
  22. P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5967–5976.
  23. R. Jena, C. Liu, and K. Sycara, “Augmenting gail with bc for sample efficient imitation learning,” arXiv preprint arXiv:2001.07798, 2020.
  24. S. Teng, L. Chen, Y. Ai, Y. Zhou, Z. Xuanyuan, and X. Hu, “Hierarchical interpretable imitation learning for end-to-end autonomous driving,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 1, pp. 673–683, 2023.
  25. Y. Wang, D. Zhang, J. Wang, Z. Chen, Y. Li, Y. Wang, and R. Xiong, “Imitation learning of hierarchical driving model: From continuous intention to continuous trajectory,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2477–2484, 2021.
  26. P. Cai, S. Wang, Y. Sun, and M. Liu, “Probabilistic end-to-end vehicle navigation in complex dynamic environments with multimodal sensor fusion,” IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 4218–4224, 2020.
  27. P. Cai, H. Wang, H. Huang, Y. Liu, and M. Liu, “Vision-based autonomous car racing using deep imitative reinforcement learning,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 7262–7269, 2021.
  28. A. Kuefler, J. Morton, T. Wheeler, and M. Kochenderfer, “Imitating driver behavior with generative adversarial networks,” in Intelligent Vehicles Symposium (IV), 2017 IEEE.   IEEE, 2017, pp. 204–211.
  29. E. Bronstein, M. Palatucci, D. Notz, B. White, A. Kuefler, Y. Lu, S. Paul, P. Nikdel, P. Mougin, H. Chen et al., “Hierarchical model-based imitation learning for planning in autonomous driving,” arXiv preprint arXiv:2210.09539, 2022.
  30. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
  31. M. Bloem and N. Bambos, “Infinite time horizon maximum causal entropy inverse reinforcement learning,” in Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on.   IEEE, 2014, pp. 4911–4916.
  32. I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved training of wasserstein gans,” 2017.
  33. M. Zhang, Y. Wang, X. Ma, L. Xia, J. Yang, Z. Li, and X. Li, “Wasserstein distance guided adversarial imitation learning with reward shape exploration,” 2020 IEEE 9th Data Driven Control and Learning Systems Conference (DDCLS), Nov 2020. [Online]. Available: http://dx.doi.org/10.1109/DDCLS49620.2020.9275169
  34. Y. Li, J. Song, and S. Ermon, “Infogail: Interpretable imitation learning from visual demonstrations,” 2017.
  35. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  36. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” CoRR, vol. abs/1707.06347, 2017. [Online]. Available: http://arxiv.org/abs/1707.06347
  37. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention.   Springer, 2015, pp. 234–241.
  38. I. G. Petrazzini and E. A. Antonelo, “Proximal policy optimization with continuous bounded action space via the beta distribution,” in 2021 IEEE Symposium Series on Computational Intelligence (SSCI).   IEEE, 2021, pp. 1–8.
  39. D. Chen, B. Zhou, V. Koltun, and P. Krähenbühl, “Learning by cheating,” 2019.
  40. A. Stooke, K. Lee, P. Abbeel, and M. Laskin, “Decoupling representation learning from reinforcement learning,” in International Conference on Machine Learning.   PMLR, 2021, pp. 9870–9879.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Youtube Logo Streamline Icon: https://streamlinehq.com