DeepLocalization: Using change point detection for Temporal Action Localization (2404.12258v1)
Abstract: In this study, we introduce DeepLocalization, an innovative framework devised for the real-time localization of actions tailored explicitly for monitoring driver behavior. Utilizing the power of advanced deep learning methodologies, our objective is to tackle the critical issue of distracted driving-a significant factor contributing to road accidents. Our strategy employs a dual approach: leveraging Graph-Based Change-Point Detection for pinpointing actions in time alongside a Video LLM (Video-LLM) for precisely categorizing activities. Through careful prompt engineering, we customize the Video-LLM to adeptly handle driving activities' nuances, ensuring its classification efficacy even with sparse data. Engineered to be lightweight, our framework is optimized for consumer-grade GPUs, making it vastly applicable in practical scenarios. We subjected our method to rigorous testing on the SynDD2 dataset, a complex benchmark for distracted driving behaviors, where it demonstrated commendable performance-achieving 57.5% accuracy in event classification and 51% in event detection. These outcomes underscore the substantial promise of DeepLocalization in accurately identifying diverse driver behaviors and their temporal occurrences, all within the bounds of limited computational resources.
- https://www.aicitychallenge.org/2024-data-and-evaluation/.
- https://www.rdocumentation.org/packages/gSeg/versions/0.1/topics/gseg2.
- Distracted Driving | Transportation Safety | Injury Center | CDC, 2023.
- Rethinking the Faster R-CNN Architecture for Temporal Action Localization. pages 1130–1139, 2018.
- Dcan: Improving temporal action detection via dual context aggregation, 2021.
- Graph-based change-point detection. The Annals of Statistics, 43(1):139 – 176, 2015.
- Asymptotic distribution-free change-point detection for multivariate and non-Euclidean data. The Annals of Statistics, 47(1):382 – 414, 2019.
- Instructblip: Towards general-purpose vision-language models with instruction tuning, 2023.
- Driver crash risk factors and prevalence evaluation using naturalistic driving data. Proceedings of the National Academy of Sciences of the United States of America, 113(10):2636–2641, 2016.
- Christoph Feichtenhofer. X3D: Expanding Architectures for Efficient Video Recognition. pages 203–213, 2020.
- SlowFast Networks for Video Recognition. pages 6202–6211, 2019.
- Deep Residual Learning for Image Recognition. pages 770–778, 2016.
- Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25:1097–1105, 2012.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models, 2023.
- Bsn: Boundary sensitive network for temporal action proposal generation, 2018.
- Visual instruction tuning, 2023.
- Video-chatgpt: Towards detailed video understanding via large vision and language models, 2023.
- The 7th AI City Challenge. pages 5537–5547, 2023.
- Learning transferable visual models from natural language supervision, 2021.
- Synthetic distracted driving (syndd1) dataset for analyzing distracted behaviors and various gaze zones of a driver. Data in Brief, 46:108793, 2023a.
- Synthetic distracted driving (syndd2) dataset for analyzing distracted behaviors and various gaze zones of a driver, 2023b.
- Rethinking the Inception Architecture for Computer Vision. pages 2818–2826, 2016.
- Transportation Research Board. Second strategic highway research program naturalistic driving study (shrp 2). 2015.
- Deep Insight: A Cloud Based Big Data Analytics Platform For Naturalistic Driving Studies: Data Management, Analytics, and Automated Annotations. International Journal of Automotive Engineering, 14(3):66–76, 2023.
- YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. pages 7464–7475, 2023.
- Temporal Action Proposal Generation with Transformers, 2021. arXiv:2105.12043 [cs].
- VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding, 2021.
- G-tad: Sub-graph localization for temporal action detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10156–10165, 2020.
- Florence: A new foundation model for computer vision, 2021.
- Temporal action localization by structured maximal sums. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, pages 3215–3223, United States, 2017. Institute of Electrical and Electronics Engineers Inc. Funding Information: This work is partially supported by a University of Michigan graduate fellowship, the Natural Science Foundation of China under Grant No. 61672273, No. 61272218, and No. 61321491, and the Science Foundation for Distinguished Young Scholars of Jiangsu under Grant No. BK20160021. Publisher Copyright: © 2017 IEEE.; 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 ; Conference date: 21-07-2017 Through 26-07-2017.
- Graph convolutional networks for temporal action localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7094–7103, 2019a.
- Graph Convolutional Networks for Temporal Action Localization. pages 7094–7103, 2019b.
- Bottom-up temporal action localization with mutual regularization, 2021.
- Judging llm-as-a-judge with mt-bench and chatbot arena, 2023.
- Multi view action recognition for distracted driver behavior localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5374–5379, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.