Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Real-world Anomaly Detection in Surveillance Videos (1801.04264v3)

Published 12 Jan 2018 in cs.CV

Abstract: Surveillance videos are able to capture a variety of realistic anomalies. In this paper, we propose to learn anomalies by exploiting both normal and anomalous videos. To avoid annotating the anomalous segments or clips in training videos, which is very time consuming, we propose to learn anomaly through the deep multiple instance ranking framework by leveraging weakly labeled training videos, i.e. the training labels (anomalous or normal) are at video-level instead of clip-level. In our approach, we consider normal and anomalous videos as bags and video segments as instances in multiple instance learning (MIL), and automatically learn a deep anomaly ranking model that predicts high anomaly scores for anomalous video segments. Furthermore, we introduce sparsity and temporal smoothness constraints in the ranking loss function to better localize anomaly during training. We also introduce a new large-scale first of its kind dataset of 128 hours of videos. It consists of 1900 long and untrimmed real-world surveillance videos, with 13 realistic anomalies such as fighting, road accident, burglary, robbery, etc. as well as normal activities. This dataset can be used for two tasks. First, general anomaly detection considering all anomalies in one group and all normal activities in another group. Second, for recognizing each of 13 anomalous activities. Our experimental results show that our MIL method for anomaly detection achieves significant improvement on anomaly detection performance as compared to the state-of-the-art approaches. We provide the results of several recent deep learning baselines on anomalous activity recognition. The low recognition performance of these baselines reveals that our dataset is very challenging and opens more opportunities for future work. The dataset is available at: https://webpages.uncc.edu/cchen62/dataset.html

Citations (1,336)

Summary

  • The paper presents a Multiple Instance Learning framework that leverages weakly labeled videos to effectively detect real-world anomalies.
  • It employs a novel ranking loss with sparsity and smoothness constraints, achieving an AUC of 75.41 and a false alarm rate of 1.9%.
  • The study introduces a comprehensive 128-hour dataset spanning 13 anomaly types, setting a robust benchmark for future research in surveillance.

Real-world Anomaly Detection in Surveillance Videos

Overview

In the paper "Real-world Anomaly Detection in Surveillance Videos" by Waqas Sultani, Chen Chen, and Mubarak Shah, a novel method for detecting anomalies in surveillance videos is presented. The approach leverages both normal and anomalous videos using a Multiple Instance Learning (MIL) framework within a deep learning paradigm. This methodology is particularly noteworthy because it makes use of weakly labeled training data, therefore avoiding the labor-intensive task of annotating anomalous segments at a granular level.

Contributions

The paper makes several significant contributions:

  1. MIL Framework for Anomaly Detection: The authors introduce a MIL-based solution for anomaly detection, incorporating a ranking loss with sparsity and smoothness constraints in their deep learning network. This innovation allows for efficient learning of anomaly scores for video segments without requiring detailed annotations.
  2. Large-Scale Dataset: This research introduces a new dataset, unprecedented in scale, comprising 1900 long and untrimmed surveillance videos totaling 128 hours. The dataset includes a wide variety of 13 different types of real-world anomalies, thereby providing a robust benchmark for both anomaly detection and activity recognition tasks.
  3. Experimental Validation: The proposed MIL method for anomaly detection demonstrates significantly improved performance over state-of-the-art approaches. Various deep learning baselines are evaluated on this new dataset, highlighting the challenges and opportunities for future research.

Technical Approach

Multiple Instance Learning (MIL)

The authors treat each surveillance video as a "bag" and its segments as "instances." Normal and anomalous videos are incorporated as negative and positive bags, respectively. A distinctive aspect of their approach is the deep anomaly ranking model that predicts high anomaly scores for anomalous segments through a ranking loss mechanism. The sparsity and temporal smoothness constraints further enhance the model's ability to accurately localize anomalies.

Loss Function

The paper proposes a ranking loss designed for MIL contexts that incorporates both sparsity and smoothness constraints. These constraints are critical in reflecting real-world scenarios where anomalies are both temporally sparse and occur smoothly over time.

Dataset Description

The newly introduced large-scale dataset contains 128 hours of videos spanning 13 distinct anomaly types, including fighting, road accidents, burglary, robbery, etc., captured by CCTV cameras. This dataset can be used for two primary tasks:

  1. General anomaly detection.
  2. Activity recognition for 13 specific anomalous activities.

The dataset's complexity and scale present significant challenges to current anomaly detection methods and provides a fertile ground for developing more advanced techniques.

Experimental Results

The MIL-based anomaly detection method proposed in the paper shows substantial improvements over existing models:

  • AUC Performance: The proposed method achieves an AUC of 75.41, outperforming other state-of-the-art methods like the dictionary-based approach (65.51 AUC) and autoencoder methods (50.6 AUC).
  • False Alarm Rate: A critical evaluation metric in real-world deployment, the proposed method records a significantly lower false alarm rate of 1.9% compared to other methods.

Implications and Future Directions

The implications of this research are manifold:

  1. Enhanced Surveillance Systems: By reducing false alarms and effectively detecting a wide variety of anomalies, the proposed approach can significantly improve the efficiency and reliability of surveillance systems.
  2. Benchmark for Future Research: The introduction of a comprehensive and challenging dataset sets a new benchmark for future research in anomaly detection and activity recognition within the context of untrimmed surveillance videos.

The innovative use of weakly labeled data and MIL frameworks suggests several avenues for future exploration. Researchers could investigate more sophisticated temporal modeling techniques and examine the application of transfer learning to leverage pre-trained models. Additionally, expanding the anomaly detection framework to incorporate multi-modal data (e.g., audio and textual information from surveillance reports) could further enhance performance.

Conclusion

The paper "Real-world Anomaly Detection in Surveillance Videos" presents a robust and scalable method for anomaly detection in surveillance videos. By leveraging weakly labeled data within a MIL framework and introducing a new large-scale dataset, the authors pave the way for significant advancements in the field of video surveillance. This work not only achieves superior detection performance but also sets a comprehensive dataset benchmark for future research endeavours.

Youtube Logo Streamline Icon: https://streamlinehq.com