GPT-4V as Traffic Assistant: An In-depth Look at Vision Language Model on Complex Traffic Events (2402.02205v3)
Abstract: The recognition and understanding of traffic incidents, particularly traffic accidents, is a topic of paramount importance in the realm of intelligent transportation systems and intelligent vehicles. This area has continually captured the extensive focus of both the academic and industrial sectors. Identifying and comprehending complex traffic events is highly challenging, primarily due to the intricate nature of traffic environments, diverse observational perspectives, and the multifaceted causes of accidents. These factors have persistently impeded the development of effective solutions. The advent of large vision-LLMs (VLMs) such as GPT-4V, has introduced innovative approaches to addressing this issue. In this paper, we explore the ability of GPT-4V with a set of representative traffic incident videos and delve into the model's capacity of understanding these complex traffic situations. We observe that GPT-4V demonstrates remarkable cognitive, reasoning, and decision-making ability in certain classic traffic events. Concurrently, we also identify certain limitations of GPT-4V, which constrain its understanding in more intricate scenarios. These limitations merit further exploration and resolution.
- “Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer”, 2023 arXiv:2308.08414 [cs.CV]
- “A9-Dataset: Multi-Sensor Infrastructure-Based Dataset for Mobility Research”, 2022 arXiv:2204.06527 [cs.CV]
- “Visual Instruction Tuning”, 2023 arXiv:2304.08485 [cs.CV]
- “Missed the intersection and caused a rear-end collision”, https://www.bilibili.com/video/BV12U4y157gs/?spm_id_from=333.337.search-card.all.click&vd_source=25b99882205bf99d791a8ceaea5664b9, 2021
- OpenAI “GPT-4 Technical Report”, 2023 URL: https://cdn.openai.com/papers/gpt-4.pdf
- OpenAI “GPT-4V(ision) System Card” In GPT-4V(ision) system card - cdn.openai.com, 2023 URL: https://cdn.openai.com/papers/GPTV_System_Card.pdf?ref=getdgtl.com
- “Providentia++ Project”, https://innovation-mobility.com/en/project-providentia/
- “Gemini: A Family of Highly Capable Multimodal Models”, 2023 arXiv:2312.11805 [cs.CL]
- “Llama 2: Open Foundation and Fine-Tuned Chat Models”, 2023 arXiv:2307.09288 [cs.CL]
- “LLaMA: Open and Efficient Foundation Language Models”, 2023 arXiv:2302.13971 [cs.CL]
- “Traffic accidents (warning function)”, https://www.bilibili.com/video/BV1EM411Y7ZE/?spm_id_from=333.337.search-card.all.click&vd_source=25b99882205bf99d791a8ceaea5664b9, 2023
- “Vehicle Suddenly Explodes on Highway (Surveillance View)”, https://www.bilibili.com/video/BV1Rg411o7p8/?spm_id_from=333.337.search-card.all.click&vd_source=25b99882205bf99d791a8ceaea5664b9, 2022
- “On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving”, 2023 arXiv:2311.05332 [cs.CV]
- Li Xu, He Huang and Jun Liu “SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events”, 2021 arXiv:2103.15538 [cs.CV]
- “TAD: A Large-Scale Benchmark for Traffic Accidents Detection from Video Surveillance”, 2022 arXiv:2209.12386 [cs.CV]
- “The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)”, 2023 arXiv:2309.17421 [cs.CV]
- “Vision Language Models in Autonomous Driving and Intelligent Transportation Systems”, 2023 arXiv:2310.14414 [cs.CV]
- “MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models”, 2023 arXiv:2304.10592 [cs.CV]
- Xingcheng Zhou (16 papers)
- Alois C. Knoll (22 papers)