Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AccidentGPT: Large Multi-Modal Foundation Model for Traffic Accident Analysis (2401.03040v1)

Published 5 Jan 2024 in cs.LG, cs.AI, cs.CV, and cs.DC

Abstract: Traffic accident analysis is pivotal for enhancing public safety and developing road regulations. Traditional approaches, although widely used, are often constrained by manual analysis processes, subjective decisions, uni-modal outputs, as well as privacy issues related to sensitive data. This paper introduces the idea of AccidentGPT, a foundation model of traffic accident analysis, which incorporates multi-modal input data to automatically reconstruct the accident process video with dynamics details, and furthermore provide multi-task analysis with multi-modal outputs. The design of the AccidentGPT is empowered with a multi-modality prompt with feedback for task-oriented adaptability, a hybrid training schema to leverage labelled and unlabelled data, and a edge-cloud split configuration for data privacy. To fully realize the functionalities of this model, we proposes several research opportunities. This paper serves as the stepping stone to fill the gaps in traditional approaches of traffic accident analysis and attract the research community attention for automatic, objective, and privacy-preserving traffic accident analysis.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Privacy and safety improvement of VANET data via a safety-related privacy scheme. International Journal of Information Security, 22(4):763–783.
  2. Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736.
  3. Traffic accident detection and condition analysis based on social networking data. Accident Analysis & Prevention, 151:105973.
  4. Constitutional ai: Harmlessness from ai feedback.
  5. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  6. Privacy management in social internet of vehicles: Review, challenges and blockchain based solutions. IEEE Access, 7:79694–79713.
  7. Road traffic accidents: An overview of data sources, analysis techniques and contributing factors. Materials Today: Proceedings, 47:5135–5141. International Conference on Sustainable materials, Manufacturing and Renewable Technologies 2021.
  8. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR.
  9. Deep reinforcement learning from human preferences. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  10. Cognata (2023). Cognata — Autonomous and ADAS Vehicles Simulation Software.
  11. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255.
  12. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
  13. A review of road traffic accidents reconstruction methods and their limitations with respect to the national legal frameworks. IOP Conference Series: Materials Science and Engineering, 1220(1):012055.
  14. Imagebind: One embedding space to bind them all.
  15. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009.
  16. Towards reasoning in large language models: A survey. In Findings of the Association for Computational Linguistics: ACL 2023, pages 1049–1065, Toronto, Canada. Association for Computational Linguistics.
  17. A comprehensive review and a taxonomy of edge machine learning: Requirements, paradigms, and techniques. AI, 4(3):729–786.
  18. Foundations and trends in multimodal machine learning: Principles, challenges, and open questions.
  19. Visual instruction tuning. arXiv preprint arXiv:2304.08485.
  20. Macaw-llm: Multi-modal language modeling with image, audio, video, and text integration.
  21. Bev-guided multi-modality fusion for driving perception. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21960–21969.
  22. A Review of the Traffic Accidents and Related Practices Worldwide. The Open Transportation Journal, 13(1):65–83.
  23. Imu2clip: Multimodal contrastive learning for imu motion sensors from egocentric videos and text.
  24. Anymal: An efficient and scalable any-modality augmented language model.
  25. Data-Driven Urban Traffic Accident Analysis and Prediction Using Logit and Machine Learning-Based Pattern Recognition Models. Mathematical Problems in Engineering, 2021:9974219.
  26. OpenAI (2023). Gpt-4 technical report.
  27. Dinov2: Learning robust visual features without supervision.
  28. Filtering, distillation, and hard negatives for vision-language pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6967–6977.
  29. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR.
  30. Gpt-4 doesn’t know it’s wrong: An analysis of iterative prompting for reasoning problems.
  31. Llama 2: Open foundation and fine-tuned chat models.
  32. Split learning for health: Distributed deep learning without sharing raw patient data.
  33. Aggn: Attention-based glioma grading network with multi-scale feature extraction and multi-modal information fusion. Computers in Biology and Medicine, 152:106457.
  34. Next-gpt: Any-to-any multimodal llm.
  35. Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5.
  36. 4k4d: Real-time 4d view synthesis at 4k resolution.
  37. Meta-transformer: A unified framework for multimodal learning.
  38. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592.
  39. Minigpt-4: Enhancing vision-language understanding with advanced large language models.
Citations (2)

Summary

We haven't generated a summary for this paper yet.