Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Goal-Guided Transformer-Enabled Reinforcement Learning for Efficient Autonomous Navigation (2301.00362v2)

Published 1 Jan 2023 in cs.RO, cs.AI, and cs.LG

Abstract: Despite some successful applications of goal-driven navigation, existing deep reinforcement learning (DRL)-based approaches notoriously suffers from poor data efficiency issue. One of the reasons is that the goal information is decoupled from the perception module and directly introduced as a condition of decision-making, resulting in the goal-irrelevant features of the scene representation playing an adversary role during the learning process. In light of this, we present a novel Goal-guided Transformer-enabled reinforcement learning (GTRL) approach by considering the physical goal states as an input of the scene encoder for guiding the scene representation to couple with the goal information and realizing efficient autonomous navigation. More specifically, we propose a novel variant of the Vision Transformer as the backbone of the perception system, namely Goal-guided Transformer (GoT), and pre-train it with expert priors to boost the data efficiency. Subsequently, a reinforcement learning algorithm is instantiated for the decision-making system, taking the goal-oriented scene representation from the GoT as the input and generating decision commands. As a result, our approach motivates the scene representation to concentrate mainly on goal-relevant features, which substantially enhances the data efficiency of the DRL learning process, leading to superior navigation performance. Both simulation and real-world experimental results manifest the superiority of our approach in terms of data efficiency, performance, robustness, and sim-to-real generalization, compared with other state-of-the-art (SOTA) baselines. The demonstration video (https://www.youtube.com/watch?v=aqJCHcsj4w0) and the source code (https://github.com/OscarHuangWind/DRL-Transformer-SimtoReal-Navigation) are also provided.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. X. He, B. Lou, H. Yang, and C. Lv, “Robust decision making for autonomous vehicles at highway on-ramps: A constrained adversarial reinforcement learning approach,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 4, pp. 4103–4113, 2022.
  2. J. Wu, W. Huang, N. de Boer, Y. Mo, X. He, and C. Lv, “Safe decision-making for lane-change of autonomous vehicles via human demonstration-aided reinforcement learning,” in 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC).   IEEE, 2022, pp. 1228–1233.
  3. W. Huang, F. Braghin, and Z. Wang, “Learning to drive via apprenticeship learning and deep reinforcement learning,” in 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI).   IEEE Computer Society, 2019, pp. 1536–1540.
  4. M. Pfeiffer, S. Shukla, M. Turchetta, C. Cadena, A. Krause, R. Siegwart, and J. Nieto, “Reinforced imitation: Sample efficient deep reinforcement learning for mapless navigation by leveraging prior demonstrations,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 4423–4430, 2018.
  5. G. Kahn, P. Abbeel, and S. Levine, “Land: Learning to navigate from disengagements,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 1872–1879, 2021.
  6. P. R. Wurman, S. Barrett, K. Kawamoto, J. MacGlashan, K. Subramanian, T. J. Walsh, R. Capobianco, A. Devlic, F. Eckert, F. Fuchs et al., “Outracing champion gran turismo drivers with deep reinforcement learning,” Nature, vol. 602, no. 7896, pp. 223–228, 2022.
  7. R. d. Abreu, T. R. Botha, and H. A. Hamersma, “Model-free intelligent control for antilock braking systems on rough roads,” SAE International Journal of Vehicle Dynamics, Stability, and NVH, vol. 7, no. 3, 2023.
  8. W. Huang, C. Zhang, J. Wu, X. He, J. Zhang, and C. Lv, “Sampling efficient deep reinforcement learning through preference-guided stochastic exploration,” arXiv preprint arXiv:2206.09627, 2022.
  9. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
  10. X. He and C. Lv, “Robust multi-agent reinforcement learning for coordinated decision-making of automated vehicles,” SAE International Journal of Vehicle Dynamics, Stability, and NVH, vol. in press, 2023.
  11. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International conference on machine learning.   PMLR, 2018, pp. 1861–1870.
  12. T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel et al., “Soft actor-critic algorithms and applications,” arXiv preprint arXiv:1812.05905, 2018.
  13. K. Yousif, A. Bab-Hadiashar, and R. Hoseinnezhad, “An overview to visual odometry and visual slam: Applications to mobile robotics,” Intelligent Industrial Systems, vol. 1, no. 4, pp. 289–311, 2015.
  14. W. Huang, Y. Zhou, J. Li, and C. Lv, “Potential hazard-aware adaptive shared control for human-robot cooperative driving in unstructured environment,” in 2022 17th International Conference on Control, Automation, Robotics and Vision (ICARCV).   IEEE, 2022, pp. 405–410.
  15. R. Cimurs, I. H. Suh, and J. H. Lee, “Goal-driven autonomous exploration through deep reinforcement learning,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 730–737, 2021.
  16. W. Zhu and M. Hayashibe, “A hierarchical deep reinforcement learning framework with high efficiency and generalization for fast and safe navigation,” IEEE Transactions on Industrial Electronics, 2022.
  17. Y. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, L. Fei-Fei, and A. Farhadi, “Target-driven visual navigation in indoor scenes using deep reinforcement learning,” in 2017 IEEE international conference on robotics and automation (ICRA).   IEEE, 2017, pp. 3357–3364.
  18. K. Wu, H. Wang, M. A. Esfahani, and S. Yuan, “Learn to navigate autonomously through deep reinforcement learning,” IEEE Transactions on Industrial Electronics, vol. 69, no. 5, pp. 5342–5352, 2021.
  19. L. Xie, S. Wang, A. Markham, and N. Trigoni, “Towards monocular vision based obstacle avoidance through deep reinforcement learning,” arXiv preprint arXiv:1706.09829, 2017.
  20. L. Xie, Y. Miao, S. Wang, P. Blunsom, Z. Wang, C. Chen, A. Markham, and N. Trigoni, “Learning with stochastic guidance for robot navigation,” IEEE transactions on neural networks and learning systems, vol. 32, no. 1, pp. 166–176, 2020.
  21. X. Liu, Y. Lu, X. Liu, S. Bai, S. Li, and J. You, “Wasserstein loss with alternative reinforcement learning for severity-aware semantic segmentation,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 1, pp. 587–596, 2020.
  22. A. Mousavian, A. Toshev, M. Fišer, J. Košecká, A. Wahid, and J. Davidson, “Visual representations for semantic target driven navigation,” in 2019 International Conference on Robotics and Automation (ICRA).   IEEE, 2019, pp. 8846–8852.
  23. J. Hawke, R. Shen, C. Gurau, S. Sharma, D. Reda, N. Nikolov, P. Mazur, S. Micklethwaite, N. Griffiths, A. Shah et al., “Urban driving with conditional imitation learning,” in 2020 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 251–257.
  24. F. Codevilla, M. Müller, A. López, V. Koltun, and A. Dosovitskiy, “End-to-end driving via conditional imitation learning,” in 2018 IEEE international conference on robotics and automation (ICRA).   IEEE, 2018, pp. 4693–4700.
  25. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  26. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  27. N. Hansen, H. Su, and X. Wang, “Stabilizing deep q-learning with convnets and vision transformers under data augmentation,” Advances in Neural Information Processing Systems, vol. 34, pp. 3680–3693, 2021.
  28. E. Kargar and V. Kyrki, “Vision transformer for learning driving policies in complex and dynamic environments,” in 2022 IEEE Intelligent Vehicles Symposium (IV).   IEEE, 2022, pp. 1558–1564.
  29. Z. Zhu and H. Zhao, “A survey of deep rl and il for autonomous driving policy learning,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 9, 2022.
  30. O. Zhelo, J. Zhang, L. Tai, M. Liu, and W. Burgard, “Curiosity-driven exploration for mapless navigation with deep reinforcement learning,” arXiv preprint arXiv:1804.00456, 2018.
  31. M. Dobrevski and D. Skocaj, “Map-less goal-driven navigation based on reinforcement learning,” in 23rd Computer Vision Winter Workshop, 2018.
  32. L. Tai, G. Paolo, and M. Liu, “Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2017, pp. 31–36.
  33. L. Tai, J. Zhang, M. Liu, and W. Burgard, “Socially compliant navigation through raw depth inputs with generative adversarial imitation learning,” in 2018 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2018, pp. 1111–1117.
  34. R. V. Godoy, G. J. Lahr, A. Dwivedi, T. J. Reis, P. H. Polegato, M. Becker, G. A. Caurin, and M. Liarokapis, “Electromyography-based, robust hand motion classification employing temporal multi-channel vision transformers,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10 200–10 207, 2022.
  35. R. Xu, H. Xiang, Z. Tu, X. Xia, M.-H. Yang, and J. Ma, “V2x-vit: Vehicle-to-everything cooperative perception with vision transformer,” arXiv preprint arXiv:2203.10638, 2022.
  36. R. Girdhar, M. Singh, N. Ravi, L. van der Maaten, A. Joulin, and I. Misra, “Omnivore: A single model for many visual modalities,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16 102–16 112.
  37. R. Bachmann, D. Mizrahi, A. Atanov, and A. Zamir, “Multimae: Multi-modal multi-task masked autoencoders,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVII.   Springer, 2022, pp. 348–367.
  38. J. Zhang, H. Liu, K. Yang, X. Hu, R. Liu, and R. Stiefelhagen, “Cmx: Cross-modal fusion for rgb-x semantic segmentation with transformers,” arXiv preprint arXiv:2203.04838, 2022.
  39. T. Haarnoja, H. Tang, P. Abbeel, and S. Levine, “Reinforcement learning with deep energy-based policies,” in International conference on machine learning.   PMLR, 2017, pp. 1352–1361.
  40. J. D. M.-W. C. Kenton and L. K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of NAACL-HLT, 2019, pp. 4171–4186.
  41. J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv preprint arXiv:1607.06450, 2016.
  42. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  43. J. Kim and J. Canny, “Interpretable learning for self-driving cars by visualizing causal attention,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2942–2950.
  44. Y. Xiao, F. Codevilla, A. Gurram, O. Urfalioglu, and A. M. López, “Multimodal end-to-end autonomous driving,” IEEE Transactions on Intelligent Transportation Systems, 2020.
  45. G. Letarte, F. Paradis, P. Giguère, and F. Laviolette, “Importance of self-attention for sentiment analysis,” in Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 2018, pp. 267–275.
  46. J. Kim and M. Bansal, “Attentional bottleneck: Towards an interpretable deep driving network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 322–323.
  47. É. Zablocki, H. Ben-Younes, P. Pérez, and M. Cord, “Explainability of deep vision-based autonomous driving systems: Review and challenges,” International Journal of Computer Vision, pp. 1–28, 2022.
  48. T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE Transactions on Robotics, vol. 34, no. 4, pp. 1004–1020, 2018.
  49. M. Burri, J. Nikolic, P. Gohl, T. Schneider, J. Rehder, S. Omari, M. W. Achtelik, and R. Siegwart, “The euroc micro aerial vehicle datasets,” The International Journal of Robotics Research, vol. 35, no. 10, pp. 1157–1163, 2016.
Citations (18)

Summary

  • The paper introduces a novel Goal-guided Transformer that integrates goal information into scene encoding to improve deep reinforcement learning efficiency.
  • The method uses expert demonstrations for pre-training, leading to faster convergence and superior performance in both simulated and real-world navigation tasks.
  • Experimental ablation and interpretability analyses, including attention flow maps and unsupervised metrics, validate the robust and focused representation of goal-relevant features.

Overview of the "Goal-guided Transformer-enabled Reinforcement Learning for Efficient Autonomous Navigation" Paper

This paper presents a paper on improving data efficiency in autonomous navigation using deep reinforcement learning (DRL). The authors address the common challenge faced by DRL-based approaches in navigation tasks, where goal information is often decoupled from scene perception. This decoupling leads to inefficient learning as goal-irrelevant features may adversely impact the training process. To resolve this issue, the paper introduces a novel methodology that integrates goal information directly into the scene representation process through a newly proposed system called Goal-guided Transformer-enabled reinforcement learning (GTRL).

The key innovation in this work is the design of the Goal-guided Transformer (GoT), a variant of the Vision Transformer (ViT), which incorporates physical goal states into the scene encoding process. This integration aims to enhance the alignment between scene representation and goal objectives. The GoT is pre-trained using expert demonstrations, serving as 'expert priors', to increase data efficiency before engaging in the reinforcement learning phase. As a result, the DRL approach is designed to focus predominantly on goal-relevant features, purportedly leading to improved autonomous navigation performance.

Significant results reported include enhanced data efficiency and superior performance metrics both in simulations and real-world settings, compared with state-of-the-art (SOTA) DRL models. The proposed GTRL approach, particularly in the configuration using the GoT structure, shows promise in sim-to-real transfer tasks, demonstrating robustness in previously unencountered conditions.

The paper details the methodologically rich evaluation conducted using simulation environments followed by real-world experiments on unmanned ground vehicles (UGVs). The quantitative assessments draw comparisons using baseline models like ConvNet-SAC and ViT-SAC, showcasing how GoT-SAC achieves a faster convergence and better success rates in autonomous navigation tasks. An ablation paper on the GoT architecture underscores the trade-off considerations in transformer design, particularly focusing on the number of self-attention heads and encoder blocks. Furthermore, using visual attention flow maps and unsupervised metrics such as the Gini coefficient and Shannon-Wiener index, the interpretability of the goal-oriented scene representation is rigorously analyzed, warranting a more concentrated and effective attention mechanism than baseline models.

The implications of these findings are twofold. Practically, the results promise advancements in goal-driven navigation strategies, possibly extending to more dynamic environments with little prior mapping. Theoretically, this integration of transformer models with reinforcement learning systems could signal a shift in how scene representations are aligned with objective functions in RL tasks. Future research in this domain could explore the scalability of the GTRL to larger-scale and more complex task environments or delve into integrating more complex multimodal data inputs to further improve navigation decision-making capabilities. The proposed framework may also serve as a foundation for developing more robust models that are pre-trained across diverse tasks before fine-tuning on specific navigation objectives.