Towards Knowledge-driven Autonomous Driving (2312.04316v3)
Abstract: This paper explores the emerging knowledge-driven autonomous driving technologies. Our investigation highlights the limitations of current autonomous driving systems, in particular their sensitivity to data bias, difficulty in handling long-tail scenarios, and lack of interpretability. Conversely, knowledge-driven methods with the abilities of cognition, generalization and life-long learning emerge as a promising way to overcome these challenges. This paper delves into the essence of knowledge-driven autonomous driving and examines its core components: dataset & benchmark, environment, and driver agent. By leveraging LLMs, world models, neural rendering, and other advanced artificial intelligence techniques, these components collectively contribute to a more holistic, adaptive, and intelligent autonomous driving system. The paper systematically organizes and reviews previous research efforts in this area, and provides insights and guidance for future research and practical applications of autonomous driving. We will continually share the latest updates on cutting-edge developments in knowledge-driven autonomous driving along with the relevant valuable open-source resources at: \url{https://github.com/PJLab-ADG/awesome-knowledge-driven-AD}.
- Y. Li and J. Ibanez-Guzman, “Lidar for autonomous driving: The principles, challenges, and trends for automotive lidar and perception systems,” IEEE Signal Processing Magazine, vol. 37, no. 4, pp. 50–61, 2020.
- J. Van Brummelen, M. O’Brien, D. Gruyer, and H. Najjaran, “Autonomous vehicle perception: The technology of today and tomorrow,” Transportation Research Part C: Emerging Technologies, vol. 89, pp. 384–406, 2018.
- C. Xiang, C. Feng, X. Xie, B. Shi, H. Lu, Y. Lv, M. Yang, and Z. Niu, “Multi-sensor fusion and cooperative perception for autonomous driving: A review,” IEEE Intelligent Transportation Systems Magazine, 2023.
- Y. Zhang, Z. Zhu, W. Zheng, J. Huang, G. Huang, J. Zhou, and J. Lu, “BEVerse: Unified perception and prediction in birds-eye-view for vision-centric autonomous driving,” arXiv preprint arXiv:2205.09743, 2022.
- Y. Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wang et al., “Planning-oriented autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 17 853–17 862.
- L. Chen, Y. Li, C. Huang, B. Li, Y. Xing, D. Tian, L. Li, Z. Hu, X. Na, Z. Li et al., “Milestones in autonomous driving and intelligent vehicles: Survey of surveys,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 2, pp. 1046–1056, 2022.
- Z. Bao, S. Hossain, H. Lang, and X. Lin, “High-definition map generation technologies for autonomous driving: a review,” arXiv preprint arXiv:2206.05400, 2022.
- J. Cheng, L. Zhang, Q. Chen, X. Hu, and J. Cai, “A review of visual slam methods for autonomous driving vehicles,” Engineering Applications of Artificial Intelligence, vol. 114, p. 104992, 2022.
- Z. Cao, X. Li, K. Jiang, W. Zhou, X. Liu, N. Deng, and D. Yang, “Autonomous driving policy continual learning with one-shot disengagement case,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 2, pp. 1380–1391, 2022.
- S. Huang, B. Zhang, B. Shi, H. Li, Y. Li, and P. Gao, “SUG: Single-dataset unified generalization for 3D point cloud classification,” in Proceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 8644–8652.
- J. Wang, X. Wang, T. Shen, Y. Wang, L. Li, Y. Tian, H. Yu, L. Chen, J. Xin, X. Wu et al., “Parallel vision for long-tail regularization: Initial results from IVFC autonomous driving testing,” IEEE Transactions on Intelligent Vehicles, vol. 7, no. 2, pp. 286–299, 2022.
- É. Zablocki, H. Ben-Younes, P. Pérez, and M. Cord, “Explainability of deep vision-based autonomous driving systems: Review and challenges,” International Journal of Computer Vision, vol. 130, no. 10, pp. 2425–2452, 2022.
- D. Fu, X. Li, L. Wen, M. Dou, P. Cai, B. Shi, and Y. Qiao, “Drive like a human: Rethinking autonomous driving with large language models,” arXiv preprint arXiv:2307.07162, 2023.
- J. Zhang, J. Pu, J. Chen, H. Fu, Y. Tao, S. Wang, Q. Chen, Y. Xiao, S. Chen, Y. Cheng et al., “DSiV: Data science for intelligent vehicles,” IEEE Transactions on Intelligent Vehicles, 2023.
- H. Shao, L. Wang, R. Chen, H. Li, and Y. Liu, “Safety-enhanced autonomous driving using interpretable sensor fusion transformer,” in Conference on Robot Learning. PMLR, 2023, pp. 726–737.
- T. Jing, H. Xia, R. Tian, H. Ding, X. Luo, J. Domeyer, R. Sherony, and Z. Ding, “Inaction: Interpretable action decision making for autonomous driving,” in European Conference on Computer Vision. Springer, 2022, pp. 370–387.
- Y. Guan, Y. Ren, Q. Sun, S. E. Li, H. Ma, J. Duan, Y. Dai, and B. Cheng, “Integrated decision and control: Toward interpretable and computationally efficient driving intelligence,” IEEE Transactions on Cybernetics, vol. 53, no. 2, pp. 859–873, 2022.
- B. Yu, C. Chen, J. Tang, S. Liu, and J.-L. Gaudiot, “Autonomous vehicles digital twin: A practical paradigm for autonomous driving system development,” Computer, vol. 55, no. 9, pp. 26–34, 2022.
- L. Masello, B. Sheehan, F. Murphy, G. Castignani, K. McDonnell, and C. Ryan, “From traditional to autonomous vehicles: A systematic review of data availability,” Transportation Research Record, vol. 2676, no. 4, pp. 161–193, 2022.
- P. Wu, X. Jia, L. Chen, J. Yan, H. Li, and Y. Qiao, “Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline,” Advances in Neural Information Processing Systems, vol. 35, pp. 6119–6132, 2022.
- L. Fantauzzo, E. Fanì, D. Caldarola, A. Tavera, F. Cermelli, M. Ciccone, and B. Caputo, “Feddrive: Generalizing federated learning to semantic segmentation in autonomous driving,” in IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2022, pp. 11 504–11 511.
- V. P. Chellapandi, L. Yuan, S. H. Zak, and Z. Wang, “A survey of federated learning for connected and automated vehicles,” arXiv preprint arXiv:2303.10677, 2023.
- D. Bogdoll, J. Breitenstein, F. Heidecker, M. Bieshaar, B. Sick, T. Fingscheidt, and M. Zöllner, “Description of corner cases in automated driving: Goals and challenges,” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, 2021, pp. 1023–1028.
- H. X. Liu and S. Feng, ““curse of rarity” for autonomous vehicles,” arXiv preprint arXiv:2207.02749, 2022.
- W. Wang, L. Wang, C. Zhang, C. Liu, L. Sun et al., “Social interactions for autonomous driving: A review and perspectives,” Foundations and Trends® in Robotics, vol. 10, no. 3-4, pp. 198–376, 2022.
- A. Sestino, A. M. Peluso, C. Amatulli, and G. Guido, “Let me drive you! the effect of change seeking and behavioral control in the artificial intelligence-based self-driving cars,” Technology in Society, vol. 70, p. 102017, 2022.
- Y. LeCun, “A path towards autonomous machine intelligence version 0.9.2, 2022-06-27,” Open Review, vol. 62, 2022.
- H. J. Levesque, “Knowledge representation and reasoning,” Annual Review of Computer Science, vol. 1, no. 1, pp. 255–287, 1986.
- W. Wang, Y. Yang, and F. Wu, “Towards data-and knowledge-driven artificial intelligence: A survey on neuro-symbolic computing,” arXiv preprint arXiv:2210.15889, 2022.
- I. J. Goodfellow, M. Mirza, D. Xiao, A. Courville, and Y. Bengio, “An empirical investigation of catastrophic forgetting in gradient-based neural networks,” arXiv preprint arXiv:1312.6211, 2013.
- J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska et al., “Overcoming catastrophic forgetting in neural networks,” Proceedings of the National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017.
- B. Zhang, J. Zhu, and H. Su, “Toward the third generation artificial intelligence,” Science China Information Sciences, vol. 66, no. 2, p. 121101, 2023.
- C. Tang, N. Srishankar, S. Martin, and M. Tomizuka, “Grounded relational inference: Domain knowledge driven explainable autonomous driving,” arXiv preprint arXiv:2102.11905, 2021.
- L. Sur, C. Tang, Y. Niu, E. Sachdeva, C. Choi, T. Misu, M. Tomizuka, and W. Zhan, “Domain knowledge driven pseudo labels for interpretable goal-conditioned interactive trajectory prediction,” in IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2022, pp. 13 034–13 041.
- M. Bahari, I. Nejjar, and A. Alahi, “Injecting knowledge in data-driven vehicle trajectory predictors,” Transportation Research Part C: Emerging Technologies, vol. 128, p. 103010, 2021.
- Q. Lan and Q. Tian, “Instance, scale, and teacher adaptive knowledge distillation for visual detection in autonomous driving,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 3, pp. 2358–2370, 2022.
- A. Khan, “A framework for autonomous process design: Towards data-driven and knowledge-driven systems,” Ph.D. dissertation, University of Cambridge, 2023.
- K. Huang, B. Shi, X. Li, X. Li, S. Huang, and Y. Li, “Multi-modal sensor fusion for auto driving perception: A survey,” arXiv preprint arXiv:2202.02703, 2022.
- R. Abbasi, A. K. Bashir, H. J. Alyamani, F. Amin, J. Doh, and J. Chen, “Lidar point cloud compression, processing and learning for autonomous driving,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 1, pp. 962–979, 2022.
- B. Fei, W. Yang, L. Liu, T. Luo, R. Zhang, Y. Li, and Y. He, “Self-supervised learning for pre-training 3D point clouds: A survey,” arXiv preprint arXiv:2305.04691, 2023.
- T. Deruyttere, S. Vandenhende, D. Grujicic, L. Van Gool, and M. F. Moens, “Talk2Car: Taking control of your self-driving car,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019, pp. 2088–2098.
- V. Dewangan, T. Choudhary, S. Chandhok, S. Priyadarshan, A. Jain, A. K. Singh, S. Srivastava, K. M. Jatavallabhula, and K. M. Krishna, “Talk2bev: Language-enhanced bird’s-eye view maps for autonomous driving,” arXiv preprint arXiv:2310.02251, 2023.
- H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuScenes: A multimodal dataset for autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 621–11 631.
- D. Wu, W. Han, T. Wang, Y. Liu, X. Zhang, and J. Shen, “Language prompt for autonomous driving,” arXiv preprint arXiv:2309.04379, 2023.
- E. Sachdeva, N. Agarwal, S. Chundi, S. Roelofs, J. Li, B. Dariush, C. Choi, and M. Kochenderfer, “Rank2tell: A multimodal driving dataset for joint importance ranking and reasoning,” arXiv preprint arXiv:2309.06597, 2023.
- T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,” arXiv preprint arXiv:2302.04761, 2023.
- X. Hu, G. Xiong, Z. Zang, P. Jia, Y. Han, and J. Ma, “PC-NeRF: Parent-child neural radiance fields under partial sensor data loss in autonomous driving environments,” arXiv preprint arXiv:2310.00874, 2023.
- Z. Wu, T. Liu, L. Luo, Z. Zhong, J. Chen, H. Xiao, C. Hou, H. Lou, Y. Chen, R. Yang et al., “MARS: An instance-aware, modular and realistic simulator for autonomous driving,” arXiv preprint arXiv:2307.15058, 2023.
- Z. Li, L. Li, and J. Zhu, “READ: Large-scale neural scene rendering for autonomous driving,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, pp. 1522–1529, 2023.
- J. Guo, N. Deng, X. Li, Y. Bai, B. Shi, C. Wang, C. Ding, D. Wang, and Y. Li, “Streetsurf: Extending multi-view implicit surface reconstruction to street views,” arXiv preprint arXiv:2306.04988, 2023.
- Z. Yang, Y. Chen, J. Wang, S. Manivasagam, W.-C. Ma, A. J. Yang, and R. Urtasun, “Unisim: A neural closed-loop sensor simulator,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1389–1399.
- A. Hu, L. Russell, H. Yeo, Z. Murez, G. Fedoseev, A. Kendall, J. Shotton, and G. Corrado, “Gaia-1: A generative world model for autonomous driving,” arXiv preprint arXiv:2309.17080, 2023.
- X. Wang, Z. Zhu, G. Huang, X. Chen, and J. Lu, “Drivedreamer: Towards real-world-driven world models for autonomous driving,” arXiv preprint arXiv:2309.09777, 2023.
- C. Min, D. Zhao, L. Xiao, Y. Nie, and B. Dai, “Uniworld: Autonomous driving pre-training via world models,” arXiv preprint arXiv:2308.07234, 2023.
- Y. Wang, J. He, L. Fan, H. Li, Y. Chen, and Z. Zhang, “Driving into the future: Multiview visual forecasting and planning with world model for autonomous driving,” arXiv preprint arXiv:2311.17918, 2023.
- Y. Li, F. Liu, L. Xing, Y. He, C. Dong, C. Yuan, J. Chen, and L. Tong, “Data generation for connected and automated vehicle tests using deep learning models,” Accident Analysis & Prevention, vol. 190, p. 107192, 2023.
- K. Muhammad, T. Hussain, H. Ullah, J. Del Ser, M. Rezaei, N. Kumar, M. Hijji, P. Bellavista, and V. H. C. de Albuquerque, “Vision-based semantic segmentation in scene understanding for autonomous driving: Recent achievements, challenges, and outlooks,” IEEE Transactions on Intelligent Transportation Systems, 2022.
- L. Fan, D. Cao, C. Zeng, B. Li, Y. Li, and F.-Y. Wang, “Cognitive-based crack detection for road maintenance: An integrated system in cyber-physical-social systems,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022.
- L. Li, T. Zhou, W. Wang, J. Li, and Y. Yang, “Deep hierarchical semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1246–1257.
- Y. Cui, S. Huang, J. Zhong, Z. Liu, Y. Wang, C. Sun, B. Li, X. Wang, and A. Khajepour, “DriveLLM: Charting the path toward full autonomous driving with large language models,” IEEE Transactions on Intelligent Vehicles, 2023.
- Z. Xu, Y. Zhang, E. Xie, Z. Zhao, Y. Guo, K. K. Wong, Z. Li, and H. Zhao, “DriveGPT4: Interpretable end-to-end autonomous driving via large language model,” arXiv preprint arXiv:2310.01412, 2023.
- D. I. Mikhailov, “Optimizing national security strategies through llm-driven artificial intelligence integration,” arXiv preprint arXiv:2305.13927, 2023.
- S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke, E. Horvitz, E. Kamar, P. Lee, Y. T. Lee, Y. Li, S. Lundberg et al., “Sparks of artificial general intelligence: Early experiments with GPT-4,” arXiv preprint arXiv:2303.12712, 2023.
- Y. Jin, X. Shen, H. Peng, X. Liu, J. Qin, J. Li, J. Xie, P. Gao, G. Zhou, and J. Gong, “SurrealDriver: Designing generative driver agent simulation framework in urban contexts based on large language model,” arXiv preprint arXiv:2309.13193, 2023.
- J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” arXiv preprint arXiv:2304.03442, 2023.
- Y. Peng, J. Han, Z. Zhang, L. Fan, T. Liu, S. Qi, X. Feng, Y. Ma, Y. Wang, and S.-C. Zhu, “The tong test: Evaluating artificial general intelligence through dynamic embodied physical and social interactions,” Engineering, 2023.
- S. Gildert and G. Rose, “Building and testing a general intelligence embodied in a humanoid robot,” arXiv preprint arXiv:2307.16770, 2023.
- L. Wen, D. Fu, X. Li, X. Cai, T. Ma, P. Cai, M. Dou, B. Shi, L. He, and Y. Qiao, “Dilu: A knowledge-driven approach to autonomous driving with large language models,” arXiv preprint arXiv:2309.16292, 2023.
- T. Yin, X. Zhou, and P. Krahenbuhl, “Center-based 3D object detection and tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 11 784–11 793.
- Z. Guo, X. Gao, J. Zhou, X. Cai, and B. Shi, “SceneDM: Scene-level multi-agent trajectory generation with consistent diffusion models,” arXiv preprint arXiv:2311.15736, 2023.
- X. Li, B. Shi, Y. Hou, X. Wu, T. Ma, Y. Li, and L. He, “Homogeneous multi-modal feature fusion and interaction for 3D object detection,” in European Conference on Computer Vision. Springer, 2022, pp. 691–707.
- B. Zhang, X. Cai, J. Yuan, D. Yang, J. Guo, R. Xia, B. Shi, M. Dou, T. Chen, S. Liu et al., “ReSimAD: Zero-shot 3D domain transfer for autonomous driving with source reconstruction and target simulation,” arXiv preprint arXiv:2309.05527, 2023.
- X. Pan, Y. You, Z. Wang, and C. Lu, “Virtual to real reinforcement learning for autonomous driving,” arXiv preprint arXiv:1704.03952, 2017.
- D. Li, L. Meng, J. Li, K. Lu, and Y. Yang, “Domain adaptive state representation alignment for reinforcement learning,” Information Sciences, vol. 609, pp. 1353–1368, 2022.
- D. Bogdoll, S. Guneshka, and J. M. Zöllner, “One ontology to rule them all: Corner case scenarios for autonomous driving,” in European Conference on Computer Vision. Springer, 2022, pp. 409–425.
- R. Fernandez-Rojas, A. Perry, H. Singh, B. Campbell, S. Elsayed, R. Hunjet, and H. A. Abbass, “Contextual awareness in human-advanced-vehicle systems: a survey,” IEEE Access, vol. 7, pp. 33 304–33 328, 2019.
- K. Ishihara, A. Kanervisto, J. Miura, and V. Hautamaki, “Multi-task learning with attention for end-to-end autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2902–2911.
- S. Casas, A. Sadat, and R. Urtasun, “MP3: A unified model to map, perceive, predict and plan,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 403–14 412.
- M. Mitchell, “AI’s challenge of understanding the world,” p. eadm8175, 2023.
- L. Zhang, Y. Xiong, Z. Yang, S. Casas, R. Hu, and R. Urtasun, “Learning unsupervised world models for autonomous driving via discrete diffusion,” arXiv preprint arXiv:2311.01017, 2023.
- W. Schwarting, A. Pierson, J. Alonso-Mora, S. Karaman, and D. Rus, “Social behavior for autonomous vehicles,” Proceedings of the National Academy of Sciences, vol. 116, no. 50, pp. 24 972–24 978, 2019.
- Z.-X. Xia, W.-C. Lai, L.-W. Tsao, L.-F. Hsu, C.-C. H. Yu, H.-H. Shuai, and W.-H. Cheng, “A human-like traffic scene understanding system: A survey,” IEEE Industrial Electronics Magazine, vol. 15, no. 1, pp. 6–15, 2020.
- D. Dubois, P. Hájek, and H. Prade, “Knowledge-driven versus data-driven logics,” Journal of Logic, Language and Information, vol. 9, pp. 65–89, 2000.
- M. O’Kelly, A. Sinha, H. Namkoong, R. Tedrake, and J. C. Duchi, “Scalable end-to-end autonomous vehicle testing via rare-event simulation,” Advances in Neural Information Processing Systems, vol. 31, 2018.
- X. Yan, Z. Zou, S. Feng, H. Zhu, H. Sun, and H. X. Liu, “Learning naturalistic driving environment with statistical realism,” Nature Communications, vol. 14, no. 1, p. 2037, 2023.
- S. Kothawade, V. Khandelwal, K. Basu, H. Wang, and G. Gupta, “AUTO-DISCERN: autonomous driving using common sense reasoning,” arXiv preprint arXiv:2110.13606, 2021.
- L. K. Saul and S. T. Roweis, “Think globally, fit locally: unsupervised learning of low dimensional manifolds,” Journal of machine learning research, vol. 4, no. Jun, pp. 119–155, 2003.
- L. Deng, “The mnist database of handwritten digit images for machine learning research [best of the web],” IEEE signal processing magazine, vol. 29, no. 6, pp. 141–142, 2012.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in IEEE Conference on Computer Vision and Pattern Recognition. Ieee, 2009, pp. 248–255.
- Z. Zou, K. Chen, Z. Shi, Y. Guo, and J. Ye, “Object detection in 20 years: A survey,” Proceedings of the IEEE, 2023.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” Advances in Neural Information Processing Systems, vol. 28, 2015.
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779–788.
- O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 2015, pp. 234–241.
- K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in Proceedings of the IEEE International onference on computer vision, 2017, pp. 2961–2969.
- Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo, “Image captioning with semantic attention,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4651–4659.
- S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh, “Vqa: Visual question answering,” in Proceedings of the IEEE International Conference on Computer vision, 2015, pp. 2425–2433.
- R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A. Shamma et al., “Visual genome: Connecting language and vision using crowdsourced dense image annotations,” International journal of computer vision, vol. 123, pp. 32–73, 2017.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in Neural Information Processing Systems, vol. 27, 2014.
- X. Pan, A. Tewari, T. Leimkühler, L. Liu, A. Meka, and C. Theobalt, “Drag your gan: Interactive point-based manipulation on the generative image manifold,” in ACM SIGGRAPH 2023 Conference Proceedings, 2023, pp. 1–11.
- D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
- J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in Neural Information Processing Systems, vol. 33, pp. 6840–6851, 2020.
- P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” Advances in Neural Information Processing Systems, vol. 34, pp. 8780–8794, 2021.
- R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10 684–10 695.
- W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong et al., “A survey of large language models,” arXiv preprint arXiv:2303.18223, 2023.
- L. Floridi and M. Chiriatti, “GPT-3: Its nature, scope, limits, and consequences,” Minds and Machines, vol. 30, pp. 681–694, 2020.
- A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann et al., “PaLM: Scaling language modeling with pathways,” arXiv preprint arXiv:2204.02311, 2022.
- H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar et al., “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023.
- OpenAI, “GPT-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
- T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in Neural Information Processing systems, vol. 33, pp. 1877–1901, 2020.
- L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27 730–27 744, 2022.
- J. Wei, M. Bosma, V. Y. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, and Q. V. Le, “Finetuned language models are zero-shot learners,” arXiv preprint arXiv:2109.01652, 2021.
- J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou et al., “Chain-of-thought prompting elicits reasoning in large language models,” Advances in Neural Information Processing Systems, vol. 35, pp. 24 824–24 837, 2022.
- OpenAI, “Introducing ChatGPT,” https://openai.com/blog/chatgpt/, 2023.
- D. Zhu, J. Chen, X. Shen, X. Li, and M. Elhoseiny, “MiniGPT-4: Enhancing vision-language understanding with advanced large language models,” arXiv preprint arXiv:2304.10592, 2023.
- E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda, “A survey of autonomous driving: Common practices and emerging technologies,” IEEE Access, vol. 8, pp. 58 443–58 469, 2020.
- L. Chen, P. Wu, K. Chitta, B. Jaeger, A. Geiger, and H. Li, “End-to-end autonomous driving: Challenges and frontiers,” arXiv preprint arXiv:2306.16927, 2023.
- Z. Liu, H. Jiang, H. Tan, and F. Zhao, “An overview of the latest progress and core challenge of autonomous vehicle technologies,” in MATEC Web of Conferences, vol. 308. EDP Sciences, 2020.
- F. Dou, J. Ye, G. Yuan, Q. Lu, W. Niu, H. Sun, L. Guan, G. Lu, G. Mai, N. Liu et al., “Towards artificial general intelligence (AGI) in the internet of things (IoT): Opportunities and challenges,” arXiv preprint arXiv:2309.07438, 2023.
- Z. Xi, W. Chen, X. Guo, W. He, Y. Ding, B. Hong, M. Zhang, J. Wang, S. Jin, E. Zhou et al., “The rise and potential of large language model based agents: A survey,” arXiv preprint arXiv:2309.07864, 2023.
- Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Y. Qiao, and J. Dai, “BEVFormer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers,” in European Conference on Computer Vision. Springer, 2022, pp. 1–18.
- Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. L. Rus, and S. Han, “BevFusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,” in IEEE International Conference on Robotics and Automation. IEEE, 2023, pp. 2774–2781.
- Y. Hou, Z. Ma, C. Liu, and C. C. Loy, “Learning lightweight lane detection cnns by self attention distillation,” in Proceedings of the IEEE/CVF International Conference on Computer vision, 2019, pp. 1013–1021.
- L. Chen, C. Sima, Y. Li, Z. Zheng, J. Xu, X. Geng, H. Li, C. He, J. Shi, Y. Qiao et al., “Persformer: 3D lane detection via perspective transformer and the openlane benchmark,” in European Conference on Computer Vision. Springer, 2022, pp. 550–567.
- L. Kong, Y. Liu, X. Li, R. Chen, W. Zhang, J. Ren, L. Pan, K. Chen, and Z. Liu, “Robo3d: Towards robust and reliable 3D perception against corruptions,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 19 994–20 006.
- Y. Liu, R. Chen, X. Li, L. Kong, Y. Yang, Z. Xia, Y. Bai, X. Zhu, Y. Ma, Y. Li et al., “Uniseg: A unified multi-modal lidar segmentation network and the openpcseg codebase,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 21 662–21 673.
- J. Huang, G. Huang, Z. Zhu, Y. Ye, and D. Du, “BEVDet: High-performance multi-camera 3D object detection in bird-eye-view,” arXiv preprint arXiv:2112.11790, 2021.
- A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12 697–12 705.
- J. Deng, S. Shi, P. Li, W. Zhou, Y. Zhang, and H. Li, “Voxel R-CNN: Towards high performance voxel-based 3D object detection,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 2, pp. 1201–1209, 2021.
- X. Bai, Z. Hu, X. Zhu, Q. Huang, Y. Chen, H. Fu, and C.-L. Tai, “TransFusion: Robust lidar-camera fusion for 3D object detection with transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1090–1099.
- X. Li, T. Ma, Y. Hou, B. Shi, Y. Yang, Y. Liu, X. Wu, Q. Chen, Y. Li, Y. Qiao et al., “LoGoNet: Towards accurate 3D object detection with local-to-global cross-modal fusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 17 524–17 534.
- Wayve, “Lingo-1: Exploring natural language for autonomous driving,” https://wayve.ai/thinking/lingo-natural-language-autonomous-driving/, 2023.
- Y. Ma, Y. Cao, J. Sun, M. Pavone, and C. Xiao, “Dolphins: Multimodal language model for driving,” arXiv preprint arXiv:2312.00438, 2023.
- D. C. Gazis, R. Herman, and R. W. Rothery, “Nonlinear follow-the-leader models of traffic flow,” Operations research, vol. 9, no. 4, pp. 545–567, 1961.
- M. Treiber, A. Hennecke, and D. Helbing, “Congested traffic states in empirical observations and microscopic simulations,” Physical review E, vol. 62, no. 2, p. 1805, 2000.
- A. Kesting, M. Treiber, and D. Helbing, “General lane-changing model mobil for car-following models,” Transportation Research Record, vol. 1999, no. 1, pp. 86–94, 2007.
- T. Hülnhagen, I. Dengler, A. Tamke, T. Dang, and G. Breuel, “Maneuver recognition using probabilistic finite-state machines and fuzzy logic,” in 2010 ieee intelligent vehicles symposium. IEEE, 2010, pp. 65–70.
- S.-H. Bae, S.-H. Joo, J.-W. Pyo, J.-S. Yoon, K. Lee, and T.-Y. Kuc, “Finite state machine based vehicle system for autonomous driving in urban environments,” in International Conference on Control, Automation and Systems. IEEE, 2020, pp. 1181–1186.
- J.-A. Bolte, A. Bar, D. Lipinski, and T. Fingscheidt, “Towards corner case detection for autonomous driving,” in IEEE Intelligent vehicles symposium, 2019, pp. 438–445.
- L. Ma, J. Xue, K. Kawabata, J. Zhu, C. Ma, and N. Zheng, “A fast RRT algorithm for motion planning of autonomous road vehicles,” in International IEEE Conference on Intelligent Transportation Systems. IEEE, 2014, pp. 1033–1038.
- L. Wen, Z. Fu, P. Cai, D. Fu, S. Mao, and B. Shi, “TrafficMCTS: A closed-loop traffic flow generation framework with group-based monte carlo tree search,” arXiv preprint arXiv:2308.12797, 2023.
- Y. Guo, Q. Zhang, J. Wang, and S. Liu, “Hierarchical reinforcement learning-based policy switching towards multi-scenarios autonomous driving,” in 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021, pp. 1–8.
- T. Rupprecht and Y. Wang, “A survey for deep reinforcement learning in markovian cyber–physical systems: Common problems and solutions,” Neural Networks, vol. 153, pp. 13–36, 2022.
- S. Arora and P. Doshi, “A survey of inverse reinforcement learning: Challenges, methods and progress,” Artificial Intelligence, vol. 297, p. 103500, 2021.
- D. Helbing and P. Molnar, “Social force model for pedestrian dynamics,” Physical review E, vol. 51, no. 5, p. 4282, 1995.
- D. Yang, Ü. Özgüner, and K. Redmill, “Social force based microscopic modeling of vehicle-crowd interaction,” in 2018 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2018, pp. 1537–1542.
- J. Wang, J. Wu, and Y. Li, “The driving safety field based on driver–vehicle–road interactions,” IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 4, pp. 2203–2214, 2015.
- J. Wang, J. Wu, X. Zheng, D. Ni, and K. Li, “Driving safety field theory modeling and its application in pre-collision warning system,” Transportation research part C: emerging technologies, vol. 72, pp. 306–324, 2016.
- Y. Liu, F. Wu, Z. Liu, K. Wang, F. Wang, and X. Qu, “Can language models be used for real-world urban-delivery route optimization?” The Innovation, vol. 4, no. 6, 2023.
- C. Cui, Y. Ma, X. Cao, W. Ye, Y. Zhou, K. Liang, J. Chen, J. Lu, Z. Yang, K.-D. Liao et al., “A survey on multimodal large language models for autonomous driving,” arXiv preprint arXiv:2311.12320, 2023.
- D. A. Pomerleau, “Alvinn: An autonomous land vehicle in a neural network,” Advances in Neural Information Processing Systems, vol. 1, 1988.
- W. Schwarting, J. Alonso-Mora, and D. Rus, “Planning and decision-making for autonomous vehicles,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 1, pp. 187–210, 2018.
- Y. Ma, Z. Wang, H. Yang, and L. Yang, “Artificial intelligence applications in the development of autonomous vehicles: A survey,” IEEE/CAA Journal of Automatica Sinica, vol. 7, no. 2, pp. 315–329, 2020.
- J. Daudelin, G. Jing, T. Tosun, M. Yim, H. Kress-Gazit, and M. Campbell, “An integrated system for perception-driven autonomy with modular robots,” Science Robotics, vol. 3, no. 23, p. eaat4983, 2018.
- A. Sadat, S. Casas, M. Ren, X. Wu, P. Dhawan, and R. Urtasun, “Perceive, predict, and plan: Safe motion planning through interpretable semantic representations,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16. Springer, 2020, pp. 414–430.
- A. Vahidi and A. Sciarretta, “Energy saving potentials of connected and automated vehicles,” Transportation Research Part C: Emerging Technologies, vol. 95, pp. 822–843, 2018.
- Y. Wang, P. Cai, and G. Lu, “Cooperative autonomous traffic organization method for connected automated vehicles in multi-intersection road networks,” Transportation research part C: emerging technologies, vol. 111, pp. 458–476, 2020.
- P. S. Chib and P. Singh, “Recent advancements in end-to-end autonomous driving using deep learning: A survey,” IEEE Transactions on Intelligent Vehicles, 2023.
- A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231–1237, 2013.
- M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3213–3223.
- H. Xu, Y. Gao, F. Yu, and T. Darrell, “End-to-end learning of driving models from large-scale video datasets,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2174–2182.
- V. Ramanishka, Y.-T. Chen, T. Misu, and K. Saenko, “Toward driving scene understanding: A dataset for learning driver behavior and causal reasoning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7699–7707.
- P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine et al., “Scalability in perception for autonomous driving: Waymo open dataset,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 2446–2454.
- W. K. Fong, R. Mohan, J. V. Hurtado, L. Zhou, H. Caesar, O. Beijbom, and A. Valada, “Panoptic nuScenes: A large-scale benchmark for lidar panoptic segmentation and tracking,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3795–3802, 2022.
- D. Wu, W. Han, T. Wang, X. Dong, X. Zhang, and J. Shen, “Referring multi-object tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14 633–14 642.
- A. B. Vasudevan, D. Dai, and L. Van Gool, “Object referring in videos with language and human gaze,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4129–4138.
- J. Kim, A. Rohrbach, T. Darrell, J. Canny, and Z. Akata, “Textual explanations for self-driving vehicles,” in Proceedings of the European conference on computer vision, 2018, pp. 563–578.
- J. Kim, T. Misu, Y.-T. Chen, A. Tawari, and J. Canny, “Grounding human-to-vehicle advice for self-driving vehicles,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 10 591–10 599.
- T. Qian, J. Chen, L. Zhuo, Y. Jiao, and Y.-G. Jiang, “NuScenes-QA: A multi-modal visual question answering benchmark for autonomous driving scenario,” arXiv preprint arXiv:2305.14836, 2023.
- D. Contributors, “Drivelm: Drive on language,” https://github.com/OpenDriveLab/DriveLM, 2023.
- S. Alletto, A. Palazzi, F. Solera, S. Calderara, and R. Cucchiara, “Dr (eye) ve: a dataset for attention-based tasks with applications to autonomous and assisted driving,” in Proceedings of the ieee conference on computer vision and pattern recognition workshops, 2016, pp. 54–60.
- J. Fang, D. Yan, J. Qiao, J. Xue, H. Wang, and S. Li, “DADA-2000: Can driving accident be predicted by driver attentionf analyzed by a benchmark,” in IEEE Intelligent Transportation Systems Conference. IEEE, 2019, pp. 4303–4309.
- Y. Qiu, C. Busso, T. Misu, and K. Akash, “Incorporating gaze behavior using joint embedding with scene context for driver takeover detection,” in IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2022, pp. 4633–4637.
- S. Malla, C. Choi, I. Dwivedi, J. H. Choi, and J. Li, “DRAMA: Joint risk localization and captioning in driving,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 1043–1052.
- A. Palazzi, D. Abati, F. Solera, R. Cucchiara et al., “Predicting the driver’s focus of attention: the DR (eye) VE project,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 7, pp. 1720–1733, 2018.
- J. Fang, D. Yan, J. Qiao, J. Xue, and H. Yu, “DADA: driver attention prediction in driving accident scenarios,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 6, pp. 4959–4971, 2022.
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: a method for automatic evaluation of machine translation,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2002, pp. 311–318.
- A. Lavie and A. Agarwal, “METEOR: An automatic metric for mt evaluation with improved correlation with human judgments,” in Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Association for Computational Linguistics, 2007, pp. 65–72.
- C.-Y. Lin, “ROUGE: A package for automatic evaluation of summaries,” in Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, 2004, pp. 74–81.
- R. Vedantam, C. L. Zitnick, and D. Parikh, “CIDEr: Consensus-based image description evaluation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 4566–4575.
- P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang, “SPICE: Semantic propositional image caption evaluation,” in European Conference on Computer Vision (ECCV), 2016, pp. 382–398.
- K. Pearson, “Note on regression and inheritance in the case of two parents,” Proceedings of the Royal Society of London, vol. 58, no. 347-352, pp. 240–242, 1895.
- S. Kullback and R. A. Leibler, “On information and sufficiency,” The Annals of Mathematical Statistics, vol. 22, no. 1, pp. 79–86, 1951.
- J. R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, no. 1, pp. 81–106, 1986.
- A. Stocco, B. Pulfer, and P. Tonella, “Mind the gap! a study on the transferability of virtual vs physical-world testing of autonomous driving systems,” IEEE Transactions on Software Engineering, 2022.
- C. Zhang, R. Guo, W. Zeng, Y. Xiong, B. Dai, R. Hu, M. Ren, and R. Urtasun, “Rethinking closed-loop training for autonomous driving,” in European Conference on Computer Vision. Springer, 2022, pp. 264–282.
- S. Feng, H. Sun, X. Yan, H. Zhu, Z. Zou, S. Shen, and H. X. Liu, “Dense reinforcement learning for safety validation of autonomous vehicles,” Nature, vol. 615, no. 7953, pp. 620–627, 2023.
- L. Li, X. Wang, K. Wang, Y. Lin, J. Xin, L. Chen, L. Xu, B. Tian, Y. Ai, J. Wang et al., “Parallel testing of vehicle intelligence via virtual-real interaction,” Science robotics, vol. 4, no. 28, p. eaaw4106, 2019.
- K. Othman, “Exploring the implications of autonomous vehicles: A comprehensive review,” Innovative Infrastructure Solutions, vol. 7, no. 2, p. 165, 2022.
- A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “Carla: An open urban driving simulator,” in Conference on Robot Learning. PMLR, 2017, pp. 1–16.
- P. A. Lopez, M. Behrisch, L. Bieker-Walz, J. Erdmann, Y.-P. Flötteröd, R. Hilbrich, L. Lücken, J. Rummel, P. Wagner, and E. Wießner, “Microscopic traffic simulation using sumo,” in International Conference on Intelligent Transportation Systems. IEEE, 2018, pp. 2575–2582.
- L. Wen, D. Fu, S. Mao, P. Cai, M. Dou, and Y. Li, “LimSim: A long-term interactive multi-scenario traffic simulator,” arXiv preprint arXiv:2307.06648, 2023.
- A. Zador, S. Escola, B. Richards, B. Ölveczky, Y. Bengio, K. Boahen, M. Botvinick, D. Chklovskii, A. Churchland, C. Clopath et al., “Toward next-generation artificial intelligence: Catalyzing the neuroai revolution,” arXiv preprint arXiv:2210.08340, 2022.
- X. Zhao, Y. Gao, S. Jin, Z. Xu, Z. Liu, W. Fan, and P. Liu, “Development of a cyber-physical-system perspective based simulation platform for optimizing connected automated vehicles dedicated lanes,” Expert Systems with Applications, vol. 213, p. 118972, 2023.
- E. Leurent, “An environment for autonomous driving decision-making,” https://github.com/eleurent/highway-env, 2018.
- H. Caesar, J. Kabzan, K. S. Tan, W. K. Fong, E. Wolff, A. Lang, L. Fletcher, O. Beijbom, and S. Omari, “nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles,” arXiv preprint arXiv:2106.11810, 2021.
- C. Gulino, J. Fu, W. Luo, G. Tucker, E. Bronstein, Y. Lu, J. Harb, X. Pan, Y. Wang, X. Chen, J. D. Co-Reyes, R. Agarwal, R. Roelofs, Y. Lu, N. Montali, P. Mougin, Z. Yang, B. White, A. Faust, R. McAllister, D. Anguelov, and B. Sapp, “Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research,” in Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2023.
- M. W. Sayers, “Vehicle models for rts applications,” Vehicle System Dynamics, vol. 32, no. 4-5, pp. 421–438, 1999.
- Hexagon, “Virtual test drive: Complete tool-chain for driving simulation applications,” https://hexagon.com/products/virtual-test-drive.
- Epic Games, “Unreal engine: The world’s most advanced real-time 3D creation tool for photoreal visuals and immersive experiences.” https://www.unrealengine.com/.
- Unity Technologies, “Unity engine: Unity’s real-time 3D development engine lets artists, designers, and developers collaborate to create amazing immersive and interactive experiences.” https://unity.com/products/unity-engine/.
- Q. Li, Z. Peng, L. Feng, Q. Zhang, Z. Xue, and B. Zhou, “MetaDrive: Composing diverse driving scenarios for generalizable reinforcement learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 3461–3475, 2022.
- W. Li, C. Pan, R. Zhang, J. Ren, Y. Ma, J. Fang, F. Yan, Q. Geng, X. Huang, H. Gong et al., “AADS: Augmented autonomous driving simulation using data-driven algorithms,” Science Robotics, vol. 4, no. 28, p. eaaw0863, 2019.
- Z. Yang, Y. Chai, D. Anguelov, Y. Zhou, P. Sun, D. Erhan, S. Rafferty, and H. Kretzschmar, “SurfelGAN: Synthesizing realistic sensor data for autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 118–11 127.
- B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “NeRF: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021.
- Z. Chen, C. Wang, Y.-C. Guo, and S.-H. Zhang, “StructNeRF: Neural radiance fields for indoor scenes with structural hints,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- X. Wu, J. Xu, Z. Zhu, H. Bao, Q. Huang, J. Tompkin, and W. Xu, “Scalable neural indoor scene rendering,” ACM Transactions on Graphics, vol. 41, no. 4, 2022.
- W. Chang, Y. Zhang, and Z. Xiong, “Depth estimation from indoor panoramas with neural scene representation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 899–908.
- Y. Wei, S. Liu, J. Zhou, and J. Lu, “Depth-guided optimization of neural radiance fields for indoor multi-view stereo,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- M. Tancik, V. Casser, X. Yan, S. Pradhan, B. Mildenhall, P. P. Srinivasan, J. T. Barron, and H. Kretzschmar, “Block-NeRF: Scalable large scene neural view synthesis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8248–8258.
- A. Tonderski, C. Lindström, G. Hess, W. Ljungbergh, L. Svensson, and C. Petersson, “NeuRAD: Neural rendering for autonomous driving,” arXiv preprint arXiv:2311.15260, 2023.
- L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, W. Zhang, B. Cui, and M.-H. Yang, “Diffusion models: A comprehensive survey of methods and applications,” ACM Computing Surveys, 2022.
- G. Yan, Z. Liu, C. Wang, C. Shi, P. Wei, X. Cai, T. Ma, Z. Liu, Z. Zhong, Y. Liu et al., “OpenCalib: A multi-sensor calibration toolbox for autonomous driving,” Software Impacts, vol. 14, p. 100393, 2022.
- C. Jiang, A. Cornman, C. Park, B. Sapp, Y. Zhou, D. Anguelov et al., “MotionDiffuser: Controllable multi-agent motion prediction using diffusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 9644–9653.
- Z. Zhong, D. Rempe, D. Xu, Y. Chen, S. Veer, T. Che, B. Ray, and M. Pavone, “Guided conditional diffusion for controllable traffic simulation,” in IEEE International Conference on Robotics and Automation. IEEE, 2023, pp. 3560–3566.
- X. Cai, W. Jiang, R. Xu, W. Zhao, J. Ma, S. Liu, and Y. Li, “Analyzing infrastructure lidar placement with realistic lidar simulation library,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 5581–5587.
- A. Swerdlow, R. Xu, and B. Zhou, “Street-view image generation from a bird’s-eye view layout,” arXiv preprint arXiv:2301.04634, 2023.
- K. Yang, E. Ma, J. Peng, Q. Guo, D. Lin, and K. Yu, “BEVControl: Accurately controlling street-view elements with multi-perspective consistency via bev sketch layout,” arXiv preprint arXiv:2308.01661, 2023.
- R. Gao, K. Chen, E. Xie, L. Hong, Z. Li, D.-Y. Yeung, and Q. Xu, “MagicDrive: Street view generation with diverse 3D geometry control,” arXiv preprint arXiv:2310.02601, 2023.
- X. Li, Y. Zhang, and X. Ye, “DrivingDiffusion: Layout-guided multi-view driving scene video generation with latent diffusion model,” arXiv preprint arXiv:2310.07771, 2023.
- J. Lu, Z. Huang, J. Zhang, Z. Yang, and L. Zhang, “WoVoGen: World volume-aware diffusion for controllable multi-camera driving scene generation,” arXiv preprint arXiv:2312.02934, 2023.
- F. Jia, W. Mao, Y. Liu, Y. Zhao, Y. Wen, C. Zhang, X. Zhang, and T. Wang, “ADriver-I: A general world model for autonomous driving,” arXiv preprint arXiv:2311.13549, 2023.
- D. Ha and J. Schmidhuber, “World models,” arXiv preprint arXiv:1803.10122, 2018.
- A. Van Den Oord, O. Vinyals et al., “Neural discrete representation learning,” Advances in Neural Information Processing Systems, vol. 30, 2017.
- W. Zheng, W. Chen, Y. Huang, B. Zhang, Y. Duan, and J. Lu, “OccWorld: Learning a 3D occupancy world model for autonomous driving,” arXiv preprint arXiv:2311.16038, 2023.
- Z. Zhang, A. Liniger, D. Dai, F. Yu, and L. Van Gool, “Trafficbots: Towards world models for autonomous driving simulation and motion prediction,” arXiv preprint arXiv:2303.04116, 2023.
- A. Martino, M. Iannelli, and C. Truong, “Knowledge injection to counter large language model (llm) hallucination,” in European Semantic Web Conference. Springer, 2023, pp. 182–185.
- D. Lenat and G. Marcus, “Getting from generative ai to trustworthy AI: What llms might learn from Cyc,” arXiv preprint arXiv:2308.04445, 2023.
- G. Agrawal, T. Kumarage, Z. Alghami, and H. Liu, “Can knowledge graphs reduce hallucinations in LLMs?: A survey,” arXiv preprint arXiv:2311.07914, 2023.
- L. Chen, O. Sinavski, J. Hünermann, A. Karnsund, A. J. Willmott, D. Birch, D. Maund, and J. Shotton, “Driving with llms: Fusing object-level vector modality for explainable autonomous driving,” arXiv preprint arXiv:2310.01957, 2023.
- R. Pfeifer and F. Iida, “Embodied artificial intelligence: Trends and challenges,” in Embodied Artificial Intelligence: International Seminar, Dagstuhl Castle, Germany, July 7-11, 2003. Revised Papers. Springer, 2004, pp. 1–26.
- L. Smith and M. Gasser, “The development of embodied cognition: Six lessons from babies,” Artificial life, vol. 11, no. 1-2, pp. 13–29, 2005.
- J. Duan, S. Yu, H. L. Tan, H. Zhu, and C. Tan, “A survey of embodied AI: From simulators to research tasks,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 6, no. 2, pp. 230–244, 2022.
- X. Zhu, Y. Chen, H. Tian, C. Tao, W. Su, C. Yang, G. Huang, B. Li, L. Lu, X. Wang et al., “Ghost in the Minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory,” arXiv preprint arXiv:2305.17144, 2023.
- R. Law, K. J. Lin, H. Ye, and D. K. C. Fong, “Artificial intelligence research in hospitality: a state-of-the-art review and future directions,” International Journal of Contemporary Hospitality Management, 2023.
- D. Driess, F. Xia, M. S. M. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu, W. Huang, Y. Chebotar, P. Sermanet, D. Duckworth, S. Levine, V. Vanhoucke, K. Hausman, M. Toussaint, K. Greff, A. Zeng, I. Mordatch, and P. Florence, “PaLM-E: An embodied multimodal language model,” in arXiv preprint arXiv:2303.03378, 2023.
- L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y. Lin et al., “A survey on large language model based autonomous agents,” arXiv preprint arXiv:2308.11432, 2023.
- J. A. Oravec, “The future of embodied AI: Containing and mitigating the dark and creepy sides of robotics, autonomous vehicles, and AI,” in Good Robot, Bad Robot: Dark and Creepy Sides of Robotics, Autonomous Vehicles, and AI. Springer, 2022, pp. 245–276.
- A. Keysan, A. Look, E. Kosman, G. Gürsun, J. Wagner, Y. Yu, and B. Rakitsch, “Can you text what is happening? integrating pre-trained language encoders into trajectory prediction models for autonomous driving,” arXiv preprint arXiv:2309.05282, 2023.
- V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “DistilBERT, a distilled version of bert: smaller, faster, cheaper and lighter,” arXiv preprint arXiv:1910.01108, 2019.
- C. Cui, Y. Ma, X. Cao, W. Ye, and Z. Wang, “Drive as you speak: Enabling human-like interaction with large language models in autonomous vehicles,” arXiv preprint arXiv:2309.10228, 2023.
- H. Sha, Y. Mu, Y. Jiang, L. Chen, C. Xu, P. Luo, S. E. Li, M. Tomizuka, W. Zhan, and M. Ding, “LanguageMPC: Large language models as decision makers for autonomous driving,” arXiv preprint arXiv:2310.03026, 2023.
- J. Li, D. Li, S. Savarese, and S. Hoi, “Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models,” arXiv preprint arXiv:2301.12597, 2023.
- H. Liu, C. Li, Q. Wu, and Y. J. Lee, “Visual instruction tuning,” arXiv preprint arXiv:2304.08485, 2023.
- S. Zhang, D. Fu, Z. Zhang, B. Yu, and P. Cai, “TrafficGPT: Viewing, processing and interacting with traffic foundation models,” arXiv preprint arXiv:2309.06719, 2023.
- C. Cui, Y. Ma, X. Cao, W. Ye, and Z. Wang, “Receive, reason, and react: Drive as you say with large language models in autonomous vehicles,” arXiv preprint arXiv:2310.08034, 2023.
- J. Mao, Y. Qian, H. Zhao, and Y. Wang, “GPT-Driver: Learning to drive with GPT,” arXiv preprint arXiv:2310.01415, 2023.
- T.-H. Wang, A. Maalouf, W. Xiao, Y. Ban, A. Amini, G. Rosman, S. Karaman, and D. Rus, “Drive anywhere: Generalizable end-to-end autonomous driving with multi-modal foundation models,” arXiv preprint arXiv:2310.17642, 2023.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- J. Mao, J. Ye, Y. Qian, M. Pavone, and Y. Wang, “A language agent for autonomous driving,” arXiv preprint arXiv:2311.10813, 2023.
- Anonymous, “3D dense captioning beyond nouns: A middleware for autonomous driving,” in Submitted to The Twelfth International Conference on Learning Representations, 2023, under review. [Online]. Available: https://openreview.net/forum?id=8T7m27VC3S
- A. Awadalla, I. Gao, J. Gardner, J. Hessel, Y. Hanafy, W. Zhu, K. Marathe, Y. Bitton, S. Gadre, S. Sagawa, J. Jitsev, S. Kornblith, P. W. Koh, G. Ilharco, M. Wortsman, and L. Schmidt, “OpenFlamingo: An open-source framework for training large autoregressive vision-language models,” arXiv preprint arXiv:2308.01390, 2023.
- X. Jia, Y. Gao, L. Chen, J. Yan, P. L. Liu, and H. Li, “Driveadapter: Breaking the coupling barrier of perception and planning in end-to-end autonomous driving,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 7953–7963.
- L. Wen, X. Yang, D. Fu, X. Wang, P. Cai, X. Li, T. Ma, Y. Li, L. Xu, D. Shang et al., “On the road with GPT-4V (ision): Early explorations of visual-language model on autonomous driving,” arXiv preprint arXiv:2311.05332, 2023.
- B. Peng, C. Li, P. He, M. Galley, and J. Gao, “Instruction tuning with GPT-4,” arXiv preprint arXiv:2304.03277, 2023.
- J. Mai, J. Chen, B. Li, G. Qian, M. Elhoseiny, and B. Ghanem, “LLM as a robotic brain: Unifying egocentric memory and control,” arXiv preprint arXiv:2304.09349, 2023.
- J. Li, X. Zhang, J. Li, Y. Liu, and J. Wang, “Building and optimization of 3D semantic map based on lidar and camera fusion,” Neurocomputing, vol. 409, pp. 394–407, 2020.
- J. S. Berrio, M. Shan, S. Worrall, and E. Nebot, “Camera-lidar integration: Probabilistic sensor fusion for semantic mapping,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 7, pp. 7637–7652, 2021.
- C. Premebida and U. Nunes, “Fusing lidar, camera and semantic information: A context-based approach for pedestrian detection,” The International Journal of Robotics Research, vol. 32, no. 3, pp. 371–384, 2013.
- S. Wang, W. Li, W. Liu, X. Liu, and J. Zhu, “LiDAR2Map: In defense of lidar-based semantic map construction using online camera distillation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5186–5195.
- J. de Curtò, I. de Zarzà, and C. T. Calafate, “Semantic scene understanding with large language models on unmanned aerial vehicles,” Drones, vol. 7, no. 2, p. 114, 2023.
- S. Wu, H. Fei, L. Qu, W. Ji, and T.-S. Chua, “Next-GPT: Any-to-any multimodal llm,” arXiv preprint arXiv:2309.05519, 2023.
- A. Elhafsi, R. Sinha, C. Agia, E. Schmerling, I. A. Nesnas, and M. Pavone, “Semantic anomaly detection with large language models,” Autonomous Robots, pp. 1–21, 2023.
- X. Zhou, M. Liu, B. L. Zagar, E. Yurtsever, and A. C. Knoll, “Vision language models in autonomous driving and intelligent transportation systems,” arXiv preprint arXiv:2310.14414, 2023.
- G. Wang, Y. Xie, Y. Jiang, A. Mandlekar, C. Xiao, Y. Zhu, L. Fan, and A. Anandkumar, “Voyager: An open-ended embodied agent with large language models,” arXiv preprint arXiv:2305.16291, 2023.
- A. Zhao, D. Huang, Q. Xu, M. Lin, Y.-J. Liu, and G. Huang, “Expel: LLM agents are experiential learners,” arXiv preprint arXiv:2308.10144, 2023.
- K. Zhang, F. Zhao, Y. Kang, and X. Liu, “Memory-augmented LLM personalization with short-and long-term memory coordination,” arXiv preprint arXiv:2309.11696, 2023.
- S. Wang, Y. Zhu, Z. Li, Y. Wang, L. Li, and Z. He, “ChatGPT as your vehicle co-pilot: An initial attempt,” IEEE Transactions on Intelligent Vehicles, 2023.
- N. Shinn, F. Cassano, A. Gopinath, K. R. Narasimhan, and S. Yao, “Reflexion: Language agents with verbal reinforcement learning,” in Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- T. X. Olausson, J. P. Inala, C. Wang, J. Gao, and A. Solar-Lezama, “Demystifying GPT self-repair for code generation,” arXiv preprint arXiv:2306.09896, 2023.
- B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, “3D gaussian splatting for real-time radiance field rendering,” ACM Transactions on Graphics (ToG), vol. 42, no. 4, pp. 1–14, 2023.
- OpenAI, “GPT-4V(ision) system card,” https://openai.com/research/gpt-4v-system-card, 2023.
- F. Sammani, T. Mukherjee, and N. Deligiannis, “NLX-GPT: A model for natural language explanations in vision and vision-language tasks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8322–8332.
- P. Gao, J. Han, R. Zhang, Z. Lin, S. Geng, A. Zhou, W. Zhang, P. Lu, C. He, X. Yue, H. Li, and Y. Qiao, “LLaMA-Adapter V2: Parameter-efficient visual instruction model,” arXiv preprint arXiv:2304.15010, 2023.
- W. Huang, C. Wang, R. Zhang, Y. Li, J. Wu, and L. Fei-Fei, “VoxPoser: Composable 3D value maps for robotic manipulation with language models,” arXiv preprint arXiv:2307.05973, 2023.
- H. Ye, T. Liu, A. Zhang, W. Hua, and W. Jia, “Cognitive mirage: A review of hallucinations in large language models,” arXiv preprint arXiv:2309.06794, 2023.
- V. Rawte, A. Sheth, and A. Das, “A survey of hallucination in large foundation models,” arXiv preprint arXiv:2309.05922, 2023.
- Xin Li (980 papers)
- Yeqi Bai (9 papers)
- Pinlong Cai (28 papers)
- Licheng Wen (31 papers)
- Daocheng Fu (22 papers)
- Bo Zhang (633 papers)
- Xuemeng Yang (18 papers)
- Xinyu Cai (26 papers)
- Tao Ma (56 papers)
- Jianfei Guo (10 papers)
- Xing Gao (133 papers)
- Min Dou (22 papers)
- Botian Shi (57 papers)
- Yong Liu (721 papers)
- Liang He (202 papers)
- Yu Qiao (563 papers)
- Yikang Li (64 papers)