Training Adversarial Discriminators for Cross-channel Abnormal Event Detection in Crowds (1706.07680v2)

Published 23 Jun 2017 in cs.CV

Abstract: Abnormal crowd behaviour detection attracts a large interest due to its importance in video surveillance scenarios. However, the ambiguity and the lack of sufficient abnormal ground truth data makes end-to-end training of large deep networks hard in this domain. In this paper we propose to use Generative Adversarial Nets (GANs), which are trained to generate only the normal distribution of the data. During the adversarial GAN training, a discriminator (D) is used as a supervisor for the generator network (G) and vice versa. At testing time we use D to solve our discriminative task (abnormality detection), where D has been trained without the need of manually-annotated abnormal data. Moreover, in order to prevent G learn a trivial identity function, we use a cross-channel approach, forcing G to transform raw-pixel data in motion information and vice versa. The quantitative results on standard benchmarks show that our method outperforms previous state-of-the-art methods in both the frame-level and the pixel-level evaluation.

Citations (191)

View on Semantic Scholar

Summary

The paper introduces a novel discriminator-focused GAN that differentiates normal from abnormal crowd behavior without relying on labeled anomalies.
The methodology leverages cross-channel tasks integrating appearance and motion cues to strengthen decision boundaries and improve detection.
Quantitative results show notable performance gains, with an EER as low as 7% and an AUC of 96.8% on the UCSD dataset.

Training Adversarial Discriminators for Abnormal Event Detection in Crowds

The paper "Training Adversarial Discriminators for Cross-channel Abnormal Event Detection in Crowds" presents a novel approach to addressing the challenges of detecting abnormal crowd behavior through the use of Generative Adversarial Networks (GANs). The paper emphasizes the importance of the task in video surveillance applications and acknowledges the intrinsic difficulties posed by the ambiguity and lack of sufficient labeled abnormal data.

Summary of Methodology

The authors propose a framework where GANs are leveraged to learn the distribution of normal crowd behavior without requiring annotated abnormal data. The approach diverges from typical GAN applications, focusing on the discriminator ( $D$ ) as the final effective anomaly detector rather than the generator ( $G$ ). During training, $G$ is tasked with generating only normal representations, while $D$ learns to differentiate between generated and real data samples. At test time, $D$ is utilized directly to identify anomalies, circumventing the need for complex classifiers or additional post-processing tasks.

Furthermore, the paper introduces a cross-channel generative task where data is processed both in terms of appearance (raw pixels) and motion (optical flow). By employing this strategy, the GAN training is enriched, preventing $G$ from learning trivial identity functions and subsequently forcing $D$ to construct substantially informative decision boundaries on unseen data distributions.

Numerical Results and Claims

The paper claims performance superiority over previous methods. Quantitative results demonstrate that their approach achieves EERs as low as 7% in frame-level evaluation on the UCSD dataset, significantly improving upon methods such as Social Force Models (SFM) and Sparse Reconstruction techniques. The AUC performance is reported at 96.8%, indicating a robust detection capability that surpasses others like Autoencoders and Denoising Autoencoders.

Implications and Future Work

The practical implications of this research are considerable. By reducing the reliance on large amounts of annotated data and simplifying the architecture to focus on the discriminator, the approach offers a scalable solution for deployment in real-world surveillance systems where abnormal occurrences are rare and annotated datasets are scarce. The framework can be further explored and optimized for other domains requiring anomaly detection, such as cybersecurity and industrial monitoring.

In theory, this paper opens avenues for investigating GANs' applicability in tasks beyond generation. The success of this work suggests potential further exploration in tasks typically not associated with GANs, promoting a paradigm shift towards discriminator-centered applications.

For future developments, researchers may delve into refining the encoding and decision-making mechanisms within GANs. This could include experimenting with alternate transformation tasks that incorporate semantic information explicitly, aiming to boost detection fidelity in complex and crowded environments.

Conclusion

Overall, the paper provides a compelling solution to crowd anomaly detection using GANs, creatively harnessing the properties of adversarial training to build effective discriminators. The approach embodies an innovative use of GANs, serving both practical and theoretical impacts within the sphere of video surveillance and anomaly detection at large.

PDF Markdown