Papers
Topics
Authors
Recent
2000 character limit reached

Video Anomaly Detection using GAN (2311.14095v1)

Published 23 Nov 2023 in cs.CV

Abstract: Accounting for the increased concern for public safety, automatic abnormal event detection and recognition in a surveillance scene is crucial. It is a current open study subject because of its intricacy and utility. The identification of aberrant events automatically, it's a difficult undertaking because everyone's idea of abnormality is different. A typical occurrence in one circumstance could be seen as aberrant in another. Automatic anomaly identification becomes particularly challenging in the surveillance footage with a large crowd due to congestion and high occlusion. With the use of machine learning techniques, this thesis study aims to offer the solution for this use case so that human resources won't be required to keep an eye out for any unusual activity in the surveillance system records. We have developed a novel generative adversarial network (GAN) based anomaly detection model. This model is trained such that it learns together about constructing a high dimensional picture space and determining the latent space from the video's context. The generator uses a residual Autoencoder architecture made up of a multi-stage channel attention-based decoder and a two-stream, deep convolutional encoder that can realise both spatial and temporal data. We have also offered a technique for refining the GAN model that reduces training time while also generalising the model by utilising transfer learning between datasets. Using a variety of assessment measures, we compare our model to the current state-of-the-art techniques on four benchmark datasets. The empirical findings indicate that, in comparison to existing techniques, our network performs favourably on all datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. Latent space autoregression for novelty detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 481–490, 2019.
  2. Robust real-time unusual event detection using multiple fixed-location monitors. IEEE transactions on pattern analysis and machine intelligence, 30(3):555–560, 2008.
  3. Ibm research trecvid-2003 video retrieval system. In TRECVID, 2003.
  4. Wasserstein generative adversarial networks. In International conference on machine learning, pages 214–223. PMLR, 2017.
  5. Real time, online detection of abandoned objects in public areas. In Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006., pages 3775–3780. IEEE, 2006.
  6. Clustering driven deep autoencoder for video anomaly detection. In European Conference on Computer Vision, pages 329–345. Springer, 2020.
  7. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8789–8797, 2018.
  8. Histograms of optical flow orientation and magnitude and entropy to detect anomalous events in videos. IEEE Transactions on Circuits and Systems for Video Technology, 27(3):673–682, 2016.
  9. Sparse reconstruction cost for abnormal event detection. In CVPR 2011, pages 3449–3456. IEEE, 2011.
  10. Residual spatiotemporal autoencoder for unsupervised video anomaly detection. Signal, Image and Video Processing, 15(1):215–222, 2021.
  11. A discriminative framework for anomaly detection in large videos. In European conference on computer vision, pages 334–349. Springer, 2016.
  12. Deep generative image models using a laplacian pyramid of adversarial networks. Advances in neural information processing systems, 28, 2015.
  13. Unsupervised image-to-image translation with generative adversarial networks. arXiv preprint arXiv:1701.02676, 2017.
  14. Online anomaly detection in surveillance videos with asymptotic bound on false alarm rate. Pattern Recognition, 114:107865, 2021.
  15. Multi-encoder towards effective anomaly detection in videos. IEEE Transactions on Multimedia, 23:4106–4116, 2020.
  16. Slowfast networks for video recognition. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6202–6211, 2019.
  17. Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1705–1714, 2019.
  18. Smart frame selection for action recognition. In AAAI, 2021.
  19. Learning temporal regularity in video sequences. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 733–742, 2016.
  20. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141, 2018.
  21. Abnormal event detection in crowded scenes using histogram of oriented contextual gradient descriptor. EURASIP Journal on Advances in Signal Processing, 2018(1):1–15, 2018.
  22. Detecting abnormal events in video using narrowed motion clusters. arXiv preprint arXiv:1801.05030, 2018.
  23. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017.
  24. Discriminative model fusion for semantic concept detection and annotation in video. In Proceedings of the eleventh ACM international conference on Multimedia, pages 255–258, 2003.
  25. 3d convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence, 35(1):221–231, 2012.
  26. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1725–1732, 2014.
  27. Hot sax: Efficiently finding the most unusual time series subsequence. In Fifth IEEE International Conference on Data Mining (ICDM’05), pages 8–pp. Ieee, 2005.
  28. H. Larochelle and Geoffrey E. Hinton. Learning to combine foveal glimpses with a third-order boltzmann machine. In NIPS, 2010.
  29. Attention-based residual autoencoder for video anomaly detection. Applied Intelligence, pages 1–15, 2022.
  30. Anomaly detection and localization in crowded scenes. IEEE transactions on pattern analysis and machine intelligence, 36(1):18–32, 2013.
  31. Spatio-temporal unity networking for video anomaly detection. IEEE Access, 7:172425–172432, 2019.
  32. Tsm: Temporal shift module for efficient video understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7083–7093, 2019.
  33. Future frame prediction for anomaly detection–a new baseline. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6536–6545, 2018.
  34. Classifier two sample test for video anomaly detections. In BMVC, page 71, 2018.
  35. Abnormal event detection at 150 fps in matlab. In Proceedings of the IEEE international conference on computer vision, pages 2720–2727, 2013.
  36. Future frame prediction using convolutional vrnn for anomaly detection. In 2019 16Th IEEE international conference on advanced video and signal based surveillance (AVSS), pages 1–8. IEEE, 2019.
  37. Few-shot scene-adaptive anomaly detection. In European Conference on Computer Vision, pages 125–141. Springer, 2020.
  38. Remembering history with convolutional lstm for anomaly detection. In 2017 IEEE International Conference on Multimedia and Expo (ICME), pages 439–444. IEEE, 2017.
  39. Video anomaly detection with sparse coding inspired deep neural networks. IEEE transactions on pattern analysis and machine intelligence, 43(3):1070–1084, 2019.
  40. Anomaly detection in crowded scenes. In 2010 IEEE computer society conference on computer vision and pattern recognition, pages 1975–1981. IEEE, 2010.
  41. Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440, 2015.
  42. Abnormal crowd behavior detection using social force model. In 2009 IEEE conference on computer vision and pattern recognition, pages 935–942. IEEE, 2009.
  43. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
  44. Learning regularity in skeleton trajectories for anomaly detection in videos. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11996–12004, 2019.
  45. Spatiotemporal anomaly detection using deep learning for real-time video surveillance. IEEE Transactions on Industrial Informatics, 16(1):393–402, 2019.
  46. Self-trained deep ordinal regression for end-to-end video anomaly detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12173–12182, 2020.
  47. K. Pearson. On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2:559–572, 1901.
  48. Connecting generative adversarial networks and actor-critic methods. arXiv preprint arXiv:1610.01945, 2016.
  49. Trajectory anomaly detection based on similarity analysis. In 2021 XLVII Latin American Computing Conference (CLEI), pages 1–10. IEEE, 2021.
  50. Bellman R. Adaptive control processes:. A Guided Tour Princeton University Press, 1961.
  51. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
  52. A survey of single-scene video anomaly detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44:2293–2312, 2022.
  53. Abnormal event detection in videos using generative adversarial nets. In 2017 IEEE international conference on image processing (ICIP), pages 1577–1581. IEEE, 2017.
  54. Training adversarial discriminators for cross-channel abnormal event detection in crowds. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1896–1904. IEEE, 2019.
  55. Video anomaly detection based on local statistical aggregates. In 2012 IEEE Conference on computer vision and pattern recognition, pages 2112–2119. IEEE, 2012.
  56. A comprehensive review on handcrafted and learning-based action representation approaches for human activity recognition. applied sciences, 7(1):110, 2017.
  57. Two-stream convolutional networks for action recognition in videos. Advances in neural information processing systems, 27, 2014.
  58. Real-world anomaly detection in surveillance videos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6479–6488, 2018.
  59. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
  60. Integrating prediction and reconstruction for anomaly detection. Pattern Recognition Letters, 129:123–130, 2020.
  61. Unmasking the abnormal events in video. In Proceedings of the IEEE international conference on computer vision, pages 2895–2903, 2017.
  62. Action recognition in video sequences using deep bi-directional lstm with cnn features. IEEE access, 6:1155–1166, 2017.
  63. Detecting video anomaly with a stacked convolutional lstm framework. In International Conference on Computer Vision Systems, pages 330–342. Springer, 2019.
  64. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.
  65. A deep one-class neural network for anomalous event detection in complex scenes. IEEE transactions on neural networks and learning systems, 31(7):2609–2622, 2019.
  66. Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes. In 2010 IEEE computer society conference on computer vision and pattern recognition, pages 2054–2060. IEEE, 2010.
  67. Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognition, 90:119–133, 2019.
  68. Detecting anomalous events in videos by learning deep representations of appearance and motion. Computer Vision and Image Understanding, 156:117–127, 2017.
  69. Improving video anomaly detection performance with patch-level loss and segmentation map. In 2020 IEEE 6th International Conference on Computer and Communications (ICCC), pages 1832–1839. IEEE, 2020.
  70. A review of generative adversarial networks and its application in cybersecurity. Artificial Intelligence Review, 53(3):1721–1736, 2020.
  71. Efficient gan-based anomaly detection. arXiv preprint arXiv:1802.06222, 2018.
  72. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European conference on computer vision (ECCV), pages 286–301, 2018.
  73. Online detection of unusual events in videos via dynamic sparse coding. In CVPR 2011, pages 3313–3320. IEEE, 2011.
  74. Anomalynet: An anomaly detection network for video surveillance. IEEE Transactions on Information Forensics and Security, 14(10):2537–2550, 2019.
  75. Attention-driven loss for anomaly detection in video surveillance. IEEE transactions on circuits and systems for video technology, 30(12):4639–4647, 2019.
Citations (1)

Summary

  • The paper introduces a novel STem-GAN framework that integrates a dual-stream encoder with an autoencoder-based Generator and a PatchGAN Discriminator for anomaly detection.
  • It employs spatio-temporal feature extraction and adversarial training to differentiate normal events from anomalies in video data.
  • Experimental evaluations on multiple benchmark datasets demonstrate competitive AUROC and EER scores, with transfer learning further enhancing model generalization.

Video Anomaly Detection using GANs

Introduction

The study titled "Video Anomaly Detection using GAN" articulates a novel approach to automatic detection of anomalies within video surveillance footage utilizing Generative Adversarial Networks (GANs). Traditional surveillance systems demand substantial manual effort often prone to human error, necessitating the automation in detecting irregular activities. GANs, known for their generative capabilities, are employed here to discern normal from abnormal events by leveraging both spatial and temporal features of video inputs.

Methodology

The paper presents a Spatio-Temporal Generative Adversarial Network (STem-GAN) composed of a Generator modeled as an Autoencoder and a Discriminator functioning as a binary classifier. The Generator encodes video frames into a low-dimensional latent space capturing essential spatio-temporal features, while the Discriminator evaluates the authenticity of generated frames against real data. Figure 1

Figure 1: Flow of Feature Extraction

Generator Architecture

The Generator's encoder utilizes a dual-stream channel to separately apprehend spatial and temporal information from the frames. It incorporates a two-stream deep convolutional encoder to encode the frames into a latent representation, where after a series of transformation through convolutional layers, the decoder reconstructs the anticipated frame.

Discriminator Architecture

The Discriminator employs a PatchGAN architecture designed to distinguish between real and fake patches of frames at a local subregion scale. This helps in emphasizing high-frequency components crucial for anomaly detection. Figure 2

Figure 2: AlexNet architecture

GAN Training

Training of the GAN follows an adversarial setup; the Generator attempts to fool the Discriminator by producing realistic frames, while the Discriminator strives to correctly classify real from generated frames. A combination of adversarial and reconstruction losses guides the optimization of model parameters.

Experimental Setup

The system's performance was evaluated against multiple benchmark datasets such as UMN, UCSD-Peds, Avenue, and Subway data. Each dataset presents distinct challenges, from varying camera angles to diverse anomaly types. For instance, pedestrian pathways in UCSD-Peds and crowd panic scenarios in UMN.

Results

Quantitative analysis demonstrates the model's competitive performance against existing methods with notable AUROC and EER scores across datasets. A direct correlation between dataset complexity and model performance was noted, highlighting improvements in scenarios involving simplistic, less ambiguous events.

Utilize of Transfer Learning

The paper also explores transfer learning to enhance model generalization and reduce training times, showing promising results in scenarios sharing characteristic dataset features.

Conclusion

The proposed STem-GAN framework advances the field of video anomaly detection with its capacity to dynamically learn and predict anomalous events from regular footage. Its application can extend to monitoring systems in public safety, traffic, and restricted access environments.

Future research could involve experimenting with larger datasets and more varied anomalies, as well as integrating emotional trait analysis for more nuanced anomaly detection systems.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.