Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection (2404.08531v1)

Published 12 Apr 2024 in cs.CV

Abstract: Weakly supervised video anomaly detection (WSVAD) is a challenging task. Generating fine-grained pseudo-labels based on weak-label and then self-training a classifier is currently a promising solution. However, since the existing methods use only RGB visual modality and the utilization of category text information is neglected, thus limiting the generation of more accurate pseudo-labels and affecting the performance of self-training. Inspired by the manual labeling process based on the event description, in this paper, we propose a novel pseudo-label generation and self-training framework based on Text Prompt with Normality Guidance (TPWNG) for WSVAD. Our idea is to transfer the rich language-visual knowledge of the contrastive language-image pre-training (CLIP) model for aligning the video event description text and corresponding video frames to generate pseudo-labels. Specifically, We first fine-tune the CLIP for domain adaptation by designing two ranking losses and a distributional inconsistency loss. Further, we propose a learnable text prompt mechanism with the assist of a normality visual prompt to further improve the matching accuracy of video event description text and video frames. Then, we design a pseudo-label generation module based on the normality guidance to infer reliable frame-level pseudo-labels. Finally, we introduce a temporal context self-adaptive learning module to learn the temporal dependencies of different video events more flexibly and accurately. Extensive experiments show that our method achieves state-of-the-art performance on two benchmark datasets, UCF-Crime and XD-Viole

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Experience report: Log mining using natural language processing and application to anomaly detection. In ISSRE, pages 351–360, 2017.
  2. Appearance-motion memory consistency network for video anomaly detection. In AAAI, pages 938–946, 2021.
  3. Mgfn: Magnitude-contrastive glance-and-focus network for weakly-supervised video anomaly detection. In AAAI, pages 387–395, 2023.
  4. Look around for anomalies: Weakly-supervised anomaly detection via context-motion relational learning. In CVPR, pages 12137–12146, 2023a.
  5. Look around for anomalies: Weakly-supervised anomaly detection via context-motion relational learning. In CVPR, pages 12137–12146, 2023b.
  6. Mist: Multiple instance self-training framework for video anomaly detection. In CVPR, pages 14009–14018, 2021.
  7. Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In CVPR, pages 1705–1714, 2019.
  8. Learning temporal regularity in video sequences. In CVPR, pages 733–742, 2016.
  9. Clip-tsa: Clip-assisted temporal self-attention for weakly-supervised video anomaly detection. In ICIP, pages 3230–3234, 2023.
  10. Bman: Bidirectional multi-scale aggregation networks for abnormal event detection. IEEE TIP, 29:2395–2408, 2019.
  11. Self-training multi-sequence learning with transformer for weakly supervised video anomaly detection. In AAAI, pages 1395–1403, 2022.
  12. Clip is also an efficient segmenter: A text-driven approach for weakly supervised semantic segmentation. In CVPR, pages 15305–15314, 2023.
  13. Distilling privileged knowledge for anomalous event detection from weakly labeled videos. IEEE TNNLS, pages 1–15, 2023.
  14. Future frame prediction for anomaly detection–a new baseline. In CVPR, pages 6536–6545, 2018.
  15. A revisit of sparse coding based anomaly detection in stacked rnn framework. In ICCV, pages 341–349, 2017.
  16. Learning normal dynamics in videos with meta prototype network. In CVPR, pages 15425–15434, 2021a.
  17. Localizing anomalies from weakly-labeled videos. IEEE TIP, 30:4505–4515, 2021b.
  18. Learning memory-guided normality for anomaly detection. In CVPR, pages 14372–14381, 2020.
  19. Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763, 2021.
  20. Real-time anomaly detection and localization in crowded scenes. In CVPR, pages 56–62, 2015.
  21. Adversarially learned one-class classifier for novelty detection. In CVPR, pages 3379–3388, 2018.
  22. Bayesian nonparametric submodular video partition for robust anomaly detection. In CVPR, pages 3212–3221, 2022.
  23. Exploiting foreground and background separation for prohibited item detection in overlapping x-ray images. PR, 122:108261, 2022.
  24. A kalman variational autoencoder model assisted by odometric clustering for video frame prediction and anomaly detection. IEEE TIP, 32:415–429, 2022.
  25. Adaptive attention span in transformers. In ACL, pages 331–335, 2019.
  26. Real-world anomaly detection in surveillance videos. In CVPR, pages 6479–6488, 2018.
  27. Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. In ICCV, pages 4975–4986, 2021.
  28. Attention is all you need. NeurIPS, 30, 2017.
  29. Solving multitask optimization problems with adaptive knowledge transfer via anomaly detection. IEEE TEC, 26(2):304–318, 2021a.
  30. Actionclip: A new paradigm for video action recognition. arXiv preprint arXiv:2109.08472, 2021b.
  31. Robust unsupervised video anomaly detection by multipath frame prediction. IEEE TNNLS, 33(6):2301–2312, 2021c.
  32. Learning causal temporal relation and feature discrimination for anomaly detection. IEEE TIP, 30:3513–3527, 2021.
  33. A deep one-class neural network for anomalous event detection in complex scenes. IEEE TNNLS, 31(7):2609–2622, 2019.
  34. Not only look, but also listen: Learning multimodal violence detection under weak supervision. In ECCV, pages 322–339, 2020.
  35. Towards video anomaly retrieval from video anomaly detection: New benchmarks and model. arXiv preprint arXiv:2307.12545, 2023.
  36. Vadclip: Adapting vision-language models for weakly supervised video anomaly detection. In AAAI, pages 6074–6082, 2024.
  37. Learning deep representations of appearance and motion for anomalous event detection. arXiv preprint arXiv:1510.01553, 2015.
  38. Detecting anomalous events in videos by learning deep representations of appearance and motion. CVIU, 156:117–127, 2017.
  39. Videoclip: Contrastive pre-training for zero-shot video-text understanding. arXiv preprint arXiv:2109.14084, 2021.
  40. Slsg: Industrial image anomaly detection by learning better feature embeddings and one-class classification. arXiv preprint arXiv:2305.00398, 2023a.
  41. Bidirectional retrospective generation adversarial network for anomaly detection in videos. IEEE Access, 9:107842–107857, 2021.
  42. Dynamic local aggregation network with adaptive clusterer for anomaly detection. In ECCV, pages 404–421, 2022.
  43. Video event restoration based on keyframes for video anomaly detection. In CVPR, pages 14592–14601, 2023b.
  44. Anopcn: Video anomaly detection via deep predictive coding network. In ACM MM, pages 1805–1813, 2019.
  45. Claws: Clustering assisted weakly supervised learning with normalcy suppression for anomalous event detection. In ECCV, pages 358–376, 2020.
  46. Stabilizing adversarially learned one-class novelty detection using pseudo anomalies. IEEE TIP, 31:5963–5975, 2022a.
  47. Generative cooperative learning for unsupervised video anomaly detection. In CVPR, pages 14744–14754, 2022b.
  48. Delving into clip latent space for video anomaly recognition. arXiv preprint arXiv:2310.02835, 2023.
  49. Draem-a discriminatively trained reconstruction embedding for surface anomaly detection. In ICCV, pages 8330–8339, 2021.
  50. Deep structured energy based models for anomaly detection. In ICML, pages 1100–1109, 2016.
  51. Exploiting completeness and uncertainty of pseudo labels for weakly supervised video anomaly detection. In CVPR, pages 16271–16280, 2023.
  52. Temporal convolutional network with complementary inner bag loss for weakly supervised anomaly detection. In ICIP, pages 4030–4034, 2019.
  53. Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection. In CVPR, pages 1237–1246, 2019.
  54. Dual memory units with uncertainty regulation for weakly supervised video anomaly detection. In AAAI, pages 3769–3777, 2023.
  55. Learning to prompt for vision-language models. IJCV, 130(9):2337–2348, 2022a.
  56. Detecting twenty-thousand classes using image-level supervision. In ECCV, pages 350–368, 2022b.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Zhiwei Yang (43 papers)
  2. Jing Liu (525 papers)
  3. Peng Wu (119 papers)
Citations (9)