Anomaly Detection in Video via Self-Supervised and Multi-Task Learning (2011.07491v3)

Published 15 Nov 2020 in cs.CV, cs.LG, and eess.IV

Abstract: Anomaly detection in video is a challenging computer vision problem. Due to the lack of anomalous events at training time, anomaly detection requires the design of learning methods without full supervision. In this paper, we approach anomalous event detection in video through self-supervised and multi-task learning at the object level. We first utilize a pre-trained detector to detect objects. Then, we train a 3D convolutional neural network to produce discriminative anomaly-specific information by jointly learning multiple proxy tasks: three self-supervised and one based on knowledge distillation. The self-supervised tasks are: (i) discrimination of forward/backward moving objects (arrow of time), (ii) discrimination of objects in consecutive/intermittent frames (motion irregularity) and (iii) reconstruction of object-specific appearance information. The knowledge distillation task takes into account both classification and detection information, generating large prediction discrepancies between teacher and student models when anomalies occur. To the best of our knowledge, we are the first to approach anomalous event detection in video as a multi-task learning problem, integrating multiple self-supervised and knowledge distillation proxy tasks in a single architecture. Our lightweight architecture outperforms the state-of-the-art methods on three benchmarks: Avenue, ShanghaiTech and UCSD Ped2. Additionally, we perform an ablation study demonstrating the importance of integrating self-supervised learning and normality-specific distillation in a multi-task learning setting.

Authors (6)

Mariana-Iuliana Georgescu (27 papers)
Antonio Barbalau (12 papers)
Radu Tudor Ionescu (103 papers)
Fahad Shahbaz Khan (225 papers)
Marius Popescu (21 papers)
Mubarak Shah (208 papers)

Citations (231)

View on Semantic Scholar

Summary

Anomaly Detection in Video via Self-Supervised and Multi-Task Learning: A Comprehensive Analysis

The paper "Anomaly Detection in Video via Self-Supervised and Multi-Task Learning" addresses the challenging and significant problem of detecting anomalies in video sequences, which is essential in various applications such as surveillance and safety monitoring. This research proposes a novel approach to anomaly detection by leveraging self-supervised and multi-task learning strategies at the object level, integrating multiple proxy tasks within a single architectural framework. The authors provide empirical evidence showing the efficacy of this method, surpassing state-of-the-art results across prominent benchmark datasets.

Methodological Contributions

The method begins with utilizing a pre-trained object detector, YOLOv3, to identify objects within the video frames. The primary innovation lies in training a 3D Convolutional Neural Network (CNN) on a set of four proxy tasks, categorized into self-supervised tasks and a knowledge distillation task:

Arrow of Time: This task involves discriminating between forward and backward moving objects, essentially providing temporal information regarding object motion.
Motion Irregularity: Focuses on distinguishing between objects in consecutive frames versus those in irregular motion patterns, which could be indicative of anomalies.
Reconstruction of Object-Specific Appearance: This self-supervised task aims to predict the appearance of objects, capturing unexpected changes in the visualization of an object.
Knowledge Distillation: By utilizing the classification and detection capabilities of pre-trained models such as YOLOv3 and ResNet-50, the task detects anomalous objects that produce significant prediction discrepancies between the teacher and student models.

This comprehensive integration of multiple tasks into a single learning framework is a key distinction of this paper, aiming to align more closely with the demands of anomaly detection through enriched feature learning.

Numerical Results and Comparative Analysis

The experimental validation of this approach is conducted on three benchmark datasets: Avenue, ShanghaiTech, and UCSD Ped2. These datasets span a variety of environments and types of anomalies, providing a robust testing ground for video anomaly detection methods. The approach achieves a frame-level Area Under the Curve (AUC) score of 92.8% on Avenue, 90.2% on ShanghaiTech, and 99.8% on UCSD Ped2. These results not only demonstrate an improvement over existing methods but also highlight the versatility and robustness of the proposed approach across different settings.

Moreover, a critical aspect of the paper is the ablation paper, which demonstrates the importance of each self-supervised task and the synergy achieved through this multi-task learning framework. The paper advocates that a single proxy task is not optimally representative of the anomaly detection task, underscoring the merit of a unified, multi-task approach.

Theoretical and Practical Implications

Theoretically, this research pushes forward the boundaries of how multiple self-supervised tasks can be effectively captured in a single model, offering insights into task interactions and the rich feature representation required for anomaly detection. Practically, the implications lie in the potential real-world applications where efficient and accurate anomaly detection can enhance automated video analysis systems.

Future Prospects in AI

Speculating on future developments, this research opens avenues for further exploration into hybrid models combining traditional and deep learning methods for anomaly detection. Additionally, advancements could involve exploring further self-supervised tasks, scaling to higher resolution data, or integrating with real-time detection systems to enhance the practical applicability of the approach.

In conclusion, by framing video anomaly detection as a multi-task problem incorporated with both self-supervised learning and knowledge distillation, the research offers a robust and effective method that outperforms existing techniques. This paper contributes significantly to the field of computer vision by demonstrating the potential of combining multiple proxy tasks within a unified architecture, suggesting a promising direction for future research endeavors in anomaly detection and related domains.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos