Video Anomaly Detection using GAN (2311.14095v1)

Published 23 Nov 2023 in cs.CV

Abstract: Accounting for the increased concern for public safety, automatic abnormal event detection and recognition in a surveillance scene is crucial. It is a current open study subject because of its intricacy and utility. The identification of aberrant events automatically, it's a difficult undertaking because everyone's idea of abnormality is different. A typical occurrence in one circumstance could be seen as aberrant in another. Automatic anomaly identification becomes particularly challenging in the surveillance footage with a large crowd due to congestion and high occlusion. With the use of machine learning techniques, this thesis study aims to offer the solution for this use case so that human resources won't be required to keep an eye out for any unusual activity in the surveillance system records. We have developed a novel generative adversarial network (GAN) based anomaly detection model. This model is trained such that it learns together about constructing a high dimensional picture space and determining the latent space from the video's context. The generator uses a residual Autoencoder architecture made up of a multi-stage channel attention-based decoder and a two-stream, deep convolutional encoder that can realise both spatial and temporal data. We have also offered a technique for refining the GAN model that reduces training time while also generalising the model by utilising transfer learning between datasets. Using a variety of assessment measures, we compare our model to the current state-of-the-art techniques on four benchmark datasets. The empirical findings indicate that, in comparison to existing techniques, our network performs favourably on all datasets.

References (75)

Citations (1)

View on Semantic Scholar

Summary

The paper introduces a novel STem-GAN framework that integrates a dual-stream encoder with an autoencoder-based Generator and a PatchGAN Discriminator for anomaly detection.
It employs spatio-temporal feature extraction and adversarial training to differentiate normal events from anomalies in video data.
Experimental evaluations on multiple benchmark datasets demonstrate competitive AUROC and EER scores, with transfer learning further enhancing model generalization.

Video Anomaly Detection using GANs

Introduction

The study titled "Video Anomaly Detection using GAN" articulates a novel approach to automatic detection of anomalies within video surveillance footage utilizing Generative Adversarial Networks (GANs). Traditional surveillance systems demand substantial manual effort often prone to human error, necessitating the automation in detecting irregular activities. GANs, known for their generative capabilities, are employed here to discern normal from abnormal events by leveraging both spatial and temporal features of video inputs.

Methodology

The paper presents a Spatio-Temporal Generative Adversarial Network (STem-GAN) composed of a Generator modeled as an Autoencoder and a Discriminator functioning as a binary classifier. The Generator encodes video frames into a low-dimensional latent space capturing essential spatio-temporal features, while the Discriminator evaluates the authenticity of generated frames against real data.

Figure 1: Flow of Feature Extraction

Generator Architecture

The Generator's encoder utilizes a dual-stream channel to separately apprehend spatial and temporal information from the frames. It incorporates a two-stream deep convolutional encoder to encode the frames into a latent representation, where after a series of transformation through convolutional layers, the decoder reconstructs the anticipated frame.

Discriminator Architecture

The Discriminator employs a PatchGAN architecture designed to distinguish between real and fake patches of frames at a local subregion scale. This helps in emphasizing high-frequency components crucial for anomaly detection.

Figure 2: AlexNet architecture

GAN Training

Training of the GAN follows an adversarial setup; the Generator attempts to fool the Discriminator by producing realistic frames, while the Discriminator strives to correctly classify real from generated frames. A combination of adversarial and reconstruction losses guides the optimization of model parameters.

Experimental Setup

The system's performance was evaluated against multiple benchmark datasets such as UMN, UCSD-Peds, Avenue, and Subway data. Each dataset presents distinct challenges, from varying camera angles to diverse anomaly types. For instance, pedestrian pathways in UCSD-Peds and crowd panic scenarios in UMN.

Results

Quantitative analysis demonstrates the model's competitive performance against existing methods with notable AUROC and EER scores across datasets. A direct correlation between dataset complexity and model performance was noted, highlighting improvements in scenarios involving simplistic, less ambiguous events.

Utilize of Transfer Learning

The paper also explores transfer learning to enhance model generalization and reduce training times, showing promising results in scenarios sharing characteristic dataset features.

Conclusion

The proposed STem-GAN framework advances the field of video anomaly detection with its capacity to dynamically learn and predict anomalous events from regular footage. Its application can extend to monitoring systems in public safety, traffic, and restricted access environments.

Future research could involve experimenting with larger datasets and more varied anomalies, as well as integrating emotional trait analysis for more nuanced anomaly detection systems.