Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CUE-Net: Violence Detection Video Analytics with Spatial Cropping, Enhanced UniformerV2 and Modified Efficient Additive Attention (2404.18952v1)

Published 27 Apr 2024 in cs.CV, cs.AI, and cs.LG

Abstract: In this paper we introduce CUE-Net, a novel architecture designed for automated violence detection in video surveillance. As surveillance systems become more prevalent due to technological advances and decreasing costs, the challenge of efficiently monitoring vast amounts of video data has intensified. CUE-Net addresses this challenge by combining spatial Cropping with an enhanced version of the UniformerV2 architecture, integrating convolutional and self-attention mechanisms alongside a novel Modified Efficient Additive Attention mechanism (which reduces the quadratic time complexity of self-attention) to effectively and efficiently identify violent activities. This approach aims to overcome traditional challenges such as capturing distant or partially obscured subjects within video frames. By focusing on both local and global spatiotemporal features, CUE-Net achieves state-of-the-art performance on the RWF-2000 and RLVS datasets, surpassing existing methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Almamon Rasool Abdali. Data efficient video transformer for violence detection. In 2021 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT), pages 195–199. IEEE, 2021.
  2. Violence detection in video using computer vision techniques. In Computer Analysis of Images and Patterns: 14th International Conference, CAIP 2011, Seville, Spain, August 29-31, 2011, Proceedings, Part II 14, pages 332–339. Springer, 2011.
  3. Multiple instance learning: A survey of problem characteristics and applications. Pattern Recognition, 77:329–353, 2018.
  4. Rwf-2000: an open large scale video database for violence detection. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 4183–4190. IEEE, 2021.
  5. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 702–703, 2020.
  6. Jean Phelipe de Oliveira Lima and Carlos Maurício Seródio Figueiredo. A temporal fusion approach for video classification with convolutional and lstm neural networks applied to violence detection. Inteligencia Artificial, 24(67):40–50, 2021.
  7. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
  8. Human skeletons and change detection for efficient violence detection in surveillance videos. Computer Vision and Image Understanding, 233:103739, 2023.
  9. Unified keypoint-based action recognition framework via structured keypoint pooling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22962–22971, 2023.
  10. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
  11. Ultralytics YOLO V8, 2023. https://github.com/ultralytics/ultralytics [Accessed: 2024-01-16].
  12. Crop and couple: cardiac image segmentation using interlinked specialist networks. In International Symposium on Biomedical Imaging, 2024.
  13. Apocrita - High Performance Computing Cluster for Queen Mary University of London, 2017.
  14. Keyframe-guided video swin transformer with multi-path excitation for violence detection. The Computer Journal, page bxad103, 2023a.
  15. Uniformer: Unified transformer for efficient spatial-temporal representation learning. In International Conference on Learning Representations, 2022.
  16. Uniformerv2: Unlocking the potential of image vits for video understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1632–1643, 2023b.
  17. Video swin transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3202–3211, 2022.
  18. Macrotrends LLC. World crime rate statistics, 2023. https://www.macrotrends.net/countries/WLD/world/crime-rate-statistics [Accessed: 2023-12-16].
  19. Sgdr: Stochastic gradient descent with warm restarts. In International Conference on Learning Representations, 2017.
  20. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019.
  21. An overview of violence detection techniques: current challenges and future directions. Artificial Intelligence Review, 56(5):4641–4666, 2023.
  22. State-of-the-art violence detection techniques in video surveillance security systems: a systematic review. PeerJ Computer Science, 8:e920, 2022.
  23. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  24. Swiftformer: Efficient additive attention for transformer-based real-time mobile vision applications. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17425–17436, 2023.
  25. Violence recognition from videos using deep learning techniques. In 2019 Ninth International Conference on Intelligent Computing and Information Systems (ICICIS), pages 80–85. IEEE, 2019.
  26. Human interaction learning on 3d skeleton point clouds for video violence recognition. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16, pages 74–90. Springer, 2020.
  27. Real-world anomaly detection in surveillance videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6479–6488, 2018.
  28. Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4975–4986, 2021.
  29. Violence detection in videos using deep recurrent and convolutional neural networks. In 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pages 154–159. IEEE, 2020.
  30. Violence detection using spatiotemporal features with 3d convolutional neural network. Sensors, 19(11):2472, 2019.
  31. A comprehensive review on vision-based violence detection in surveillance videos. ACM Computing Surveys, 55(10):1–44, 2023.

Summary

We haven't generated a summary for this paper yet.