Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GPT-4V as Traffic Assistant: An In-depth Look at Vision Language Model on Complex Traffic Events (2402.02205v3)

Published 3 Feb 2024 in cs.CV

Abstract: The recognition and understanding of traffic incidents, particularly traffic accidents, is a topic of paramount importance in the realm of intelligent transportation systems and intelligent vehicles. This area has continually captured the extensive focus of both the academic and industrial sectors. Identifying and comprehending complex traffic events is highly challenging, primarily due to the intricate nature of traffic environments, diverse observational perspectives, and the multifaceted causes of accidents. These factors have persistently impeded the development of effective solutions. The advent of large vision-LLMs (VLMs) such as GPT-4V, has introduced innovative approaches to addressing this issue. In this paper, we explore the ability of GPT-4V with a set of representative traffic incident videos and delve into the model's capacity of understanding these complex traffic situations. We observe that GPT-4V demonstrates remarkable cognitive, reasoning, and decision-making ability in certain classic traffic events. Concurrently, we also identify certain limitations of GPT-4V, which constrain its understanding in more intricate scenarios. These limitations merit further exploration and resolution.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. “Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer”, 2023 arXiv:2308.08414 [cs.CV]
  2. “A9-Dataset: Multi-Sensor Infrastructure-Based Dataset for Mobility Research”, 2022 arXiv:2204.06527 [cs.CV]
  3. “Visual Instruction Tuning”, 2023 arXiv:2304.08485 [cs.CV]
  4. “Missed the intersection and caused a rear-end collision”, https://www.bilibili.com/video/BV12U4y157gs/?spm_id_from=333.337.search-card.all.click&vd_source=25b99882205bf99d791a8ceaea5664b9, 2021
  5. OpenAI “GPT-4 Technical Report”, 2023 URL: https://cdn.openai.com/papers/gpt-4.pdf
  6. OpenAI “GPT-4V(ision) System Card” In GPT-4V(ision) system card - cdn.openai.com, 2023 URL: https://cdn.openai.com/papers/GPTV_System_Card.pdf?ref=getdgtl.com
  7. “Providentia++ Project”, https://innovation-mobility.com/en/project-providentia/
  8. “Gemini: A Family of Highly Capable Multimodal Models”, 2023 arXiv:2312.11805 [cs.CL]
  9. “Llama 2: Open Foundation and Fine-Tuned Chat Models”, 2023 arXiv:2307.09288 [cs.CL]
  10. “LLaMA: Open and Efficient Foundation Language Models”, 2023 arXiv:2302.13971 [cs.CL]
  11. “Traffic accidents (warning function)”, https://www.bilibili.com/video/BV1EM411Y7ZE/?spm_id_from=333.337.search-card.all.click&vd_source=25b99882205bf99d791a8ceaea5664b9, 2023
  12. “Vehicle Suddenly Explodes on Highway (Surveillance View)”, https://www.bilibili.com/video/BV1Rg411o7p8/?spm_id_from=333.337.search-card.all.click&vd_source=25b99882205bf99d791a8ceaea5664b9, 2022
  13. “On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving”, 2023 arXiv:2311.05332 [cs.CV]
  14. Li Xu, He Huang and Jun Liu “SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events”, 2021 arXiv:2103.15538 [cs.CV]
  15. “TAD: A Large-Scale Benchmark for Traffic Accidents Detection from Video Surveillance”, 2022 arXiv:2209.12386 [cs.CV]
  16. “The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)”, 2023 arXiv:2309.17421 [cs.CV]
  17. “Vision Language Models in Autonomous Driving and Intelligent Transportation Systems”, 2023 arXiv:2310.14414 [cs.CV]
  18. “MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models”, 2023 arXiv:2304.10592 [cs.CV]
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Xingcheng Zhou (16 papers)
  2. Alois C. Knoll (22 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.