Reality Bites: Assessing the Realism of Driving Scenarios with Large Language Models (2403.09906v1)
Abstract: LLMs are demonstrating outstanding potential for tasks such as text generation, summarization, and classification. Given that such models are trained on a humongous amount of online knowledge, we hypothesize that LLMs can assess whether driving scenarios generated by autonomous driving testing techniques are realistic, i.e., being aligned with real-world driving conditions. To test this hypothesis, we conducted an empirical evaluation to assess whether LLMs are effective and robust in performing the task. This reality check is an important step towards devising LLM-based autonomous driving testing techniques. For our empirical evaluation, we selected 64 realistic scenarios from \deepscenario--an open driving scenario dataset. Next, by introducing minor changes to them, we created 512 additional realistic scenarios, to form an overall dataset of 576 scenarios. With this dataset, we evaluated three LLMs (\gpt, \llama, and \mistral) to assess their robustness in assessing the realism of driving scenarios. Our results show that: (1) Overall, \gpt achieved the highest robustness compared to \llama and \mistral, consistently throughout almost all scenarios, roads, and weather conditions; (2) \mistral performed the worst consistently; (3) \llama achieved good results under certain conditions; and (4) roads and weather conditions do influence the robustness of the LLMs.
- Scenario based testing of automated driving systems: A literature survey. In FISITA web Congress, volume 10, 2020.
- A survey on automated driving system testing: Landscapes and trends. ACM Trans. Softw. Eng. Methodol., 32(5), jul 2023.
- Finding critical scenarios for automated driving systems: A systematic mapping study. IEEE Trans. Softw. Eng., 49(3):991–1026, mar 2023.
- Testing vision-based control systems using learnable evolutionary algorithms. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), pages 1016–1026. IEEE, 2018.
- Testing autonomous cars for feature interaction failures using many-objective search. In 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 143–154. IEEE, 2018.
- Generating avoidable collision scenarios for testing autonomous driving systems. In 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST), pages 375–386. IEEE, 2020.
- Av-fuzzer: Finding safety violations in autonomous driving systems. In 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), pages 25–36. IEEE, 2020.
- Dense reinforcement learning for safety validation of autonomous vehicles. Nature, 615(7953):620–627, 2023.
- Adversarial evaluation of autonomous vehicles in lane-change scenarios. IEEE Transactions on Intelligent Transportation Systems, 23(8):10333–10342, 2022.
- Learning configurations of operating environment of autonomous vehicles to maximize their collisions. IEEE Transactions on Software Engineering, 49(1):384–402, 2022.
- Many-objective reinforcement learning for online testing of dnn-enabled systems. In Proceedings of the 45th International Conference on Software Engineering, ICSE ’23, page 1814–1826. IEEE Press, 2023.
- Causality-driven testing of autonomous driving systems. ACM Transactions on Software Engineering and Methodology, 2023.
- Mind the gap! a study on the transferability of virtual versus physical-world testing of autonomous driving systems. IEEE Trans. Softw. Eng., 49(4):1928–1940, apr 2023.
- Deepqtest: Testing autonomous driving systems with reinforcement learning and real-world weather data, 2023.
- Risk Assessment of Highly Automated Vehicles with Naturalistic Driving Data: A Surrogate-based optimization Method. In 2022 IEEE Intelligent Vehicles Symposium (IV), pages 580–585, June 2022.
- Learning naturalistic driving environment with statistical realism. Nature Communications, 14(1):2037, 2023.
- Generating effective test cases for self-driving cars from police reports. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 257–267, 2019.
- Llm4drive: A survey of large language models for autonomous driving, 2023.
- A survey on multimodal large language models for autonomous driving, 2023.
- OpenAI. Gpt 3.5, 2023.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
- Deepscenario: An open driving scenario dataset for autonomous driving system testing. In 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR), pages 52–56, 2023.
- Av-fuzzer: Finding safety violations in autonomous driving systems. In 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), pages 25–36, 2020.
- Lawbreaker: An approach for specifying traffic laws and fuzzing autonomous vehicles. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, ASE ’22, New York, NY, USA, 2023. Association for Computing Machinery.
- A survey on large language models for software engineering. arXiv preprint arXiv:2312.15223, 2023.
- Large language models for software engineering: Survey and open problems. arXiv preprint arXiv:2310.03533, 2023.
- Large language models for software engineering: A systematic literature review. arXiv preprint arXiv:2308.10620, 2023.
- Studying the usage of text-to-text transfer transformer to support code-related tasks. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pages 336–347. IEEE, 2021.
- Magicoder: Source code is all you need. arXiv preprint arXiv:2312.02120, 2023.
- Automated repair of programs from large language models. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 1469–1481, 2023.
- Examining zero-shot vulnerability repair with large language models. In 2023 IEEE Symposium on Security and Privacy (SP), pages 2339–2356, 2023.
- Prcbert: Prompt learning for requirement classification using bert-based pretrained language models. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, pages 1–13, 2022.
- Improving requirements completeness: Automated assistance through large language models. arXiv preprint arXiv:2308.03784, 2023.
- Software testing with large language model: Survey, landscape, and vision. arXiv preprint arXiv:2307.07221, 2023.
- Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 919–931, 2023.
- Testing the limits: Unusual text inputs generation for mobile app crash detection with large language model. arXiv preprint arXiv:2310.15657, 2023.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
- A preliminary evaluation of chatgpt in requirements information retrieval. arXiv preprint arXiv:2304.12562, 2023.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
- Llm is like a box of chocolates: the non-determinism of chatgpt in code generation. arXiv preprint arXiv:2308.02828, 2023.
- Look before you leap: An exploratory study of uncertainty measurement for large language models. arXiv preprint arXiv:2307.10236, 2023.
- Language prompt for autonomous driving, 2023.
- Hilm-d: Towards high-resolution understanding in multimodal large language models for autonomous driving, 2023.
- Can you text what is happening? integrating pre-trained language encoders into trajectory prediction models for autonomous driving, 2023.
- Mtd-gpt: A multi-task decision-making gpt model for autonomous driving at unsignalized intersections, 2023.
- Languagempc: Large language models as decision makers for autonomous driving, 2023.
- Dilu: A knowledge-driven approach to autonomous driving with large language models. arXiv preprint arXiv:2309.16292, 2023.
- Drive like a human: Rethinking autonomous driving with large language models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 910–919, 2024.
- Proto-clip: Vision-language prototypical network for few-shot learning, 2023.
- Multimodality helps unimodality: Cross-modal few-shot learning with multimodal models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19325–19337, 2023.
- Drivegpt4: Interpretable end-to-end autonomous driving via large language model, 2023.
- Target: Automated scenario generation from traffic rules for testing autonomous vehicles, 2023.
- OpenWeather. Openweather: Weather forecasts, nowcasts and history in a fast and elegant way, 2012.
- Defining and substantiating the terms scene, situation, and scenario for automated driving. In 2015 IEEE 18th international conference on intelligent transportation systems, pages 982–988. IEEE, 2015.
- Parameter coverage for testing of autonomous driving systems under uncertainty. ACM Transactions on Software Engineering and Methodology, 32(3):1–31, 2023.
- A systematic evaluation of large language models of code. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming, pages 1–10, 2022.
- Repository for the paper “Reality Bites: Assessing the Realism of Driving Scenarios with Large Language Models”. https://github.com/Simula-COMPLEX/RealityBites, 2024.