Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection (2404.08531v1)
Abstract: Weakly supervised video anomaly detection (WSVAD) is a challenging task. Generating fine-grained pseudo-labels based on weak-label and then self-training a classifier is currently a promising solution. However, since the existing methods use only RGB visual modality and the utilization of category text information is neglected, thus limiting the generation of more accurate pseudo-labels and affecting the performance of self-training. Inspired by the manual labeling process based on the event description, in this paper, we propose a novel pseudo-label generation and self-training framework based on Text Prompt with Normality Guidance (TPWNG) for WSVAD. Our idea is to transfer the rich language-visual knowledge of the contrastive language-image pre-training (CLIP) model for aligning the video event description text and corresponding video frames to generate pseudo-labels. Specifically, We first fine-tune the CLIP for domain adaptation by designing two ranking losses and a distributional inconsistency loss. Further, we propose a learnable text prompt mechanism with the assist of a normality visual prompt to further improve the matching accuracy of video event description text and video frames. Then, we design a pseudo-label generation module based on the normality guidance to infer reliable frame-level pseudo-labels. Finally, we introduce a temporal context self-adaptive learning module to learn the temporal dependencies of different video events more flexibly and accurately. Extensive experiments show that our method achieves state-of-the-art performance on two benchmark datasets, UCF-Crime and XD-Viole
- Experience report: Log mining using natural language processing and application to anomaly detection. In ISSRE, pages 351–360, 2017.
- Appearance-motion memory consistency network for video anomaly detection. In AAAI, pages 938–946, 2021.
- Mgfn: Magnitude-contrastive glance-and-focus network for weakly-supervised video anomaly detection. In AAAI, pages 387–395, 2023.
- Look around for anomalies: Weakly-supervised anomaly detection via context-motion relational learning. In CVPR, pages 12137–12146, 2023a.
- Look around for anomalies: Weakly-supervised anomaly detection via context-motion relational learning. In CVPR, pages 12137–12146, 2023b.
- Mist: Multiple instance self-training framework for video anomaly detection. In CVPR, pages 14009–14018, 2021.
- Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In CVPR, pages 1705–1714, 2019.
- Learning temporal regularity in video sequences. In CVPR, pages 733–742, 2016.
- Clip-tsa: Clip-assisted temporal self-attention for weakly-supervised video anomaly detection. In ICIP, pages 3230–3234, 2023.
- Bman: Bidirectional multi-scale aggregation networks for abnormal event detection. IEEE TIP, 29:2395–2408, 2019.
- Self-training multi-sequence learning with transformer for weakly supervised video anomaly detection. In AAAI, pages 1395–1403, 2022.
- Clip is also an efficient segmenter: A text-driven approach for weakly supervised semantic segmentation. In CVPR, pages 15305–15314, 2023.
- Distilling privileged knowledge for anomalous event detection from weakly labeled videos. IEEE TNNLS, pages 1–15, 2023.
- Future frame prediction for anomaly detection–a new baseline. In CVPR, pages 6536–6545, 2018.
- A revisit of sparse coding based anomaly detection in stacked rnn framework. In ICCV, pages 341–349, 2017.
- Learning normal dynamics in videos with meta prototype network. In CVPR, pages 15425–15434, 2021a.
- Localizing anomalies from weakly-labeled videos. IEEE TIP, 30:4505–4515, 2021b.
- Learning memory-guided normality for anomaly detection. In CVPR, pages 14372–14381, 2020.
- Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763, 2021.
- Real-time anomaly detection and localization in crowded scenes. In CVPR, pages 56–62, 2015.
- Adversarially learned one-class classifier for novelty detection. In CVPR, pages 3379–3388, 2018.
- Bayesian nonparametric submodular video partition for robust anomaly detection. In CVPR, pages 3212–3221, 2022.
- Exploiting foreground and background separation for prohibited item detection in overlapping x-ray images. PR, 122:108261, 2022.
- A kalman variational autoencoder model assisted by odometric clustering for video frame prediction and anomaly detection. IEEE TIP, 32:415–429, 2022.
- Adaptive attention span in transformers. In ACL, pages 331–335, 2019.
- Real-world anomaly detection in surveillance videos. In CVPR, pages 6479–6488, 2018.
- Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. In ICCV, pages 4975–4986, 2021.
- Attention is all you need. NeurIPS, 30, 2017.
- Solving multitask optimization problems with adaptive knowledge transfer via anomaly detection. IEEE TEC, 26(2):304–318, 2021a.
- Actionclip: A new paradigm for video action recognition. arXiv preprint arXiv:2109.08472, 2021b.
- Robust unsupervised video anomaly detection by multipath frame prediction. IEEE TNNLS, 33(6):2301–2312, 2021c.
- Learning causal temporal relation and feature discrimination for anomaly detection. IEEE TIP, 30:3513–3527, 2021.
- A deep one-class neural network for anomalous event detection in complex scenes. IEEE TNNLS, 31(7):2609–2622, 2019.
- Not only look, but also listen: Learning multimodal violence detection under weak supervision. In ECCV, pages 322–339, 2020.
- Towards video anomaly retrieval from video anomaly detection: New benchmarks and model. arXiv preprint arXiv:2307.12545, 2023.
- Vadclip: Adapting vision-language models for weakly supervised video anomaly detection. In AAAI, pages 6074–6082, 2024.
- Learning deep representations of appearance and motion for anomalous event detection. arXiv preprint arXiv:1510.01553, 2015.
- Detecting anomalous events in videos by learning deep representations of appearance and motion. CVIU, 156:117–127, 2017.
- Videoclip: Contrastive pre-training for zero-shot video-text understanding. arXiv preprint arXiv:2109.14084, 2021.
- Slsg: Industrial image anomaly detection by learning better feature embeddings and one-class classification. arXiv preprint arXiv:2305.00398, 2023a.
- Bidirectional retrospective generation adversarial network for anomaly detection in videos. IEEE Access, 9:107842–107857, 2021.
- Dynamic local aggregation network with adaptive clusterer for anomaly detection. In ECCV, pages 404–421, 2022.
- Video event restoration based on keyframes for video anomaly detection. In CVPR, pages 14592–14601, 2023b.
- Anopcn: Video anomaly detection via deep predictive coding network. In ACM MM, pages 1805–1813, 2019.
- Claws: Clustering assisted weakly supervised learning with normalcy suppression for anomalous event detection. In ECCV, pages 358–376, 2020.
- Stabilizing adversarially learned one-class novelty detection using pseudo anomalies. IEEE TIP, 31:5963–5975, 2022a.
- Generative cooperative learning for unsupervised video anomaly detection. In CVPR, pages 14744–14754, 2022b.
- Delving into clip latent space for video anomaly recognition. arXiv preprint arXiv:2310.02835, 2023.
- Draem-a discriminatively trained reconstruction embedding for surface anomaly detection. In ICCV, pages 8330–8339, 2021.
- Deep structured energy based models for anomaly detection. In ICML, pages 1100–1109, 2016.
- Exploiting completeness and uncertainty of pseudo labels for weakly supervised video anomaly detection. In CVPR, pages 16271–16280, 2023.
- Temporal convolutional network with complementary inner bag loss for weakly supervised anomaly detection. In ICIP, pages 4030–4034, 2019.
- Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection. In CVPR, pages 1237–1246, 2019.
- Dual memory units with uncertainty regulation for weakly supervised video anomaly detection. In AAAI, pages 3769–3777, 2023.
- Learning to prompt for vision-language models. IJCV, 130(9):2337–2348, 2022a.
- Detecting twenty-thousand classes using image-level supervision. In ECCV, pages 350–368, 2022b.
- Zhiwei Yang (43 papers)
- Jing Liu (525 papers)
- Peng Wu (119 papers)