Learning Regularity in Skeleton Trajectories for Anomaly Detection in Videos (1903.03295v2)

Published 8 Mar 2019 in cs.CV

Abstract: Appearance features have been widely used in video anomaly detection even though they contain complex entangled factors. We propose a new method to model the normal patterns of human movements in surveillance video for anomaly detection using dynamic skeleton features. We decompose the skeletal movements into two sub-components: global body movement and local body posture. We model the dynamics and interaction of the coupled features in our novel Message-Passing Encoder-Decoder Recurrent Network. We observed that the decoupled features collaboratively interact in our spatio-temporal model to accurately identify human-related irregular events from surveillance video sequences. Compared to traditional appearance-based models, our method achieves superior outlier detection performance. Our model also offers "open-box" examination and decision explanation made possible by the semantically understandable features and a network architecture supporting interpretability.

Citations (246)

View on Semantic Scholar

Summary

The paper introduces a two-branch MPED-RNN that decomposes skeletal movements into global movement and local posture to refine anomaly detection.
It demonstrates superior frame-level ROC AUC performance on challenging datasets like ShanghaiTech and CUHK Avenue compared to current state-of-the-art methods.
The model provides open-box interpretability by visualizing prediction deviations, clarifying how learned human movement patterns signal anomalies.

Unveiling Human Movement Patterns: A Novel Approach for Video Anomaly Detection Using Skeleton Trajectories

Introduction

Advancements in anomaly detection in videos have largely been steered by the utilization of appearance features. However, these conventional methods grapple with the complexity and entanglement inherent in appearance-based features, which often obscure the essence of human movements and interactions in a scene. The research presented herein introduces a pioneering method that shifts the focus to dynamic skeleton features, extracting the quintessence of human movements by dissecting them into global body movement and local body posture. This segmentation lays the foundation for the proposed Message-Passing Encoder-Decoder Recurrent Network (MPED-RNN), a model that proficiently captures the normal patterns of human activities to identify anomalies.

Modeling Human Dynamics Using Skeleton Trajectories

At the core of this research is the decomposition of skeletal movements into two principal components: global movement and local posture. This dissection not only simplifies the representation of human motion but also enriches the semantic content of the analyzed data. By tracing these components across the spatio-temporal expanse of video sequences, the model gains a nuanced understanding of what constitutes typical human behavior in surveillance contexts.

The proposed MPED-RNN architecture ingeniously models the intertwined dynamics of these two sub-components. With dedicated branches for both global and local features, the model leverages cross-branch message passing, allowing for a comprehensive synthesis of movement and posture. This methodological cornerstone ensures an integrated analysis while maintaining the distinctiveness of global and local movements.

Empirical Validation and Performance

The validity and superiority of the MPED-RNN model are demonstrated through meticulous experimentation on two challenging datasets: ShanghaiTech Campus and CUHK Avenue. Notably, the focus on human-related anomalies further refines the scope of analysis, distinguishing the model’s applicability to more pertinent surveillance situations.

On the Human-related (HR) ShanghaiTech subset, MPED-RNN outperforms existing state-of-the-art models in anomaly detection, as evidenced by its superior frame-level ROC AUC performance. This indicates a significant leap forward in precisely identifying irregular human activities in videos.

Interpretability and Analytical Insights

A distinguishing feature of MPED-RNN is its "open-box" interpretability, a trait uncommon in many contemporary models. Through visualizations of predicted features against actual inputs, the model elucidates its internal mechanics, specifying how predictions diverge from norms in anomalous instances. This characteristic not only bolsters trust in the model’s decisions but also facilitates further research and refinement by clearly delineating areas of improvement.

Future Directions

Despite the model's accomplishments, the dependency on the quality of skeleton detection and tracking points towards areas needing enhancement. Future endeavors may explore the integration of skeleton features with visual counterparts, promising a more holistic anomaly detection approach. Moreover, extending this model to encapsulate multi-person interactions and object dynamics presents an exciting frontier for expanding its applicability.

Conclusion

The research presents MPED-RNN as a novel, interpretable, and effective model for video anomaly detection focusing on human movements. By mining the rich semantic information contained in skeleton trajectories, this work stands as a testament to the potential benefits of leveraging structured, dynamic features over traditional appearance-based methods. As the field of anomaly detection moves towards more granular and interpretable models, MPED-RNN offers a compelling direction, marrying performance with understandability.

PDF Markdown