Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning (2101.10030v3)

Published 25 Jan 2021 in cs.CV

Abstract: Anomaly detection with weakly supervised video-level labels is typically formulated as a multiple instance learning (MIL) problem, in which we aim to identify snippets containing abnormal events, with each video represented as a bag of video snippets. Although current methods show effective detection performance, their recognition of the positive instances, i.e., rare abnormal snippets in the abnormal videos, is largely biased by the dominant negative instances, especially when the abnormal events are subtle anomalies that exhibit only small differences compared with normal events. This issue is exacerbated in many methods that ignore important video temporal dependencies. To address this issue, we introduce a novel and theoretically sound method, named Robust Temporal Feature Magnitude learning (RTFM), which trains a feature magnitude learning function to effectively recognise the positive instances, substantially improving the robustness of the MIL approach to the negative instances from abnormal videos. RTFM also adapts dilated convolutions and self-attention mechanisms to capture long- and short-range temporal dependencies to learn the feature magnitude more faithfully. Extensive experiments show that the RTFM-enabled MIL model (i) outperforms several state-of-the-art methods by a large margin on four benchmark data sets (ShanghaiTech, UCF-Crime, XD-Violence and UCSD-Peds) and (ii) achieves significantly improved subtle anomaly discriminability and sample efficiency. Code is available at https://github.com/tianyu0207/RTFM.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yu Tian (249 papers)
  2. Guansong Pang (82 papers)
  3. Yuanhong Chen (30 papers)
  4. Rajvinder Singh (11 papers)
  5. Johan W. Verjans (16 papers)
  6. Gustavo Carneiro (129 papers)
Citations (249)

Summary

Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning

The paper introduces a novel method for weakly-supervised video anomaly detection, termed Robust Temporal Feature Magnitude (RTFM) learning. The primary focus of this research is to address key challenges in Multiple Instance Learning (MIL) frameworks employed for anomaly detection in videos with weak labels—typically, labels at the video level rather than at the snippet level. RTFM shows considerable improvement over existing state-of-the-art methods in terms of anomaly detection accuracy and sample efficiency across several benchmark datasets: ShanghaiTech, UCF-Crime, XD-Violence, and UCSD-Peds.

Contributions and Methodology

The principal contribution of the paper is the development of a theoretically grounded method that targets enhanced discrimination between anomalous and normal video snippets. This is accomplished by focusing on the temporal feature magnitude associated with video snippets. One core insight of the work is that the mean feature magnitude of anomalous snippets is typically larger than that of normal snippets—a foundation that RTFM leverages to better separate these instances in the feature space.

  1. Feature Magnitude-based MIL: The authors reformulate MIL to incorporate a feature magnitude learning function. They utilize a top-kk instance strategy where the top kk snippets with the highest feature magnitudes are considered for evaluating the difference between normal and anomalous video segments. Their approach is derived from the assumption and theoretical assurance that abnormal snippets have larger feature magnitudes than normal ones, providing a more reliable means for anomaly detection.
  2. Temporal Dependencies: To capture both short- and long-range temporal dependencies efficiently, the method integrates a pyramid of dilated convolutions and self-attention mechanisms. This multi-scale temporal network (MTN) is pivotal in learning a robust representation that effectively highlights subtle anomalies.
  3. Robustness and Theoretical Guarantees: The methodology offers robustness against the dominance of normal snippets in the training phase by emphasizing feature magnitude in those snippets selected from the top-kk group. This theoretical underpinning guarantees a more effective training process and enables accurate separation and classification of anomalous events.

Experimental Results

The empirical evaluation involves rigorous experimentation across multiple benchmark datasets, demonstrating strong numerical results in anomaly detection performance:

  • ShanghaiTech: The RTFM method achieved a 97.21% AUC with I3D features, surpassing previous methods by significant margins.
  • UCF-Crime: RTFM outperformed existing MIL-based approaches by at least 5.37% in terms of AUC with I3D features.
  • XD-Violence and UCSD-Peds: Significant improvements in average precision (AP) and AUC, respectively, were observed, emphasizing the model's efficacy and adaptability across diverse datasets.

Implications and Future Directions

Practically, this work suggests a paradigm shift in how video anomaly detection can be approached under weak supervision, thereby reducing the dependency on extensive manual annotation efforts. Theoretically, it opens avenues for further exploration of feature magnitude as a discriminative tool in machine learning models beyond anomaly detection.

For future work, exploring the application of this framework to other real-world scenarios with subtle nuances in abnormality, such as financial fraud detection or cybersecurity threats, will be intriguing. Additionally, considering further integration of other advanced attention mechanisms or feature selection in temporal domains might drive even better performance and versatility of the proposed method in adapting to novel data modalities.

Overall, the research provides significant insights and robust methodologies beneficial for researchers and practitioners focused on advancing video anomaly detection and weakly-supervised ML systems.