Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Occlusion-Aware 3D Motion Interpretation for Abnormal Behavior Detection (2407.16788v1)

Published 23 Jul 2024 in cs.CV

Abstract: Estimating abnormal posture based on 3D pose is vital in human pose analysis, yet it presents challenges, especially when reconstructing 3D human poses from monocular datasets with occlusions. Accurate reconstructions enable the restoration of 3D movements, which assist in the extraction of semantic details necessary for analyzing abnormal behaviors. However, most existing methods depend on predefined key points as a basis for estimating the coordinates of occluded joints, where variations in data quality have adversely affected the performance of these models. In this paper, we present OAD2D, which discriminates against motion abnormalities based on reconstructing 3D coordinates of mesh vertices and human joints from monocular videos. The OAD2D employs optical flow to capture motion prior information in video streams, enriching the information on occluded human movements and ensuring temporal-spatial alignment of poses. Moreover, we reformulate the abnormal posture estimation by coupling it with Motion to Text (M2T) model in which, the VQVAE is employed to quantize motion features. This approach maps motion tokens to text tokens, allowing for a semantically interpretable analysis of motion, and enhancing the generalization of abnormal posture detection boosted by LLM. Our approach demonstrates the robustness of abnormal behavior detection against severe and self-occlusions, as it reconstructs human motion trajectories in global coordinates to effectively mitigate occlusion issues. Our method, validated using the Human3.6M, 3DPW, and NTU RGB+D datasets, achieves a high $F_1-$Score of 0.94 on the NTU RGB+D dataset for medical condition detection. And we will release all of our code and data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. Exploiting temporal context for 3d human pose estimation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  3395–3404, 2019.
  3. Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, pp.  769–787. Springer, 2020.
  4. Beyond static features for temporally consistent 3d human pose and shape from a video. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  1964–1973, 2021.
  5. Contributors, M. MMFlow: Openmmlab optical flow toolbox and benchmark. https://github.com/open-mmlab/mmflow, 2021.
  6. Every frame counts: Joint learning of video segmentation and optical flow. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp.  10713–10720, 2020.
  7. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision, pp.  2758–2766, 2015.
  8. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  9. Generating diverse and natural 3d human motions from text. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  5152–5161, June 2022.
  10. An end-to-end human abnormal behavior recognition framework for crowds with mentally disordered individuals. IEEE Journal of Biomedical and Health Informatics, 26(8):3618–3625, 2021.
  11. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7):1325–1339, jul 2014.
  12. An algorithm for abnormal behavior recognition based on sharing human target tracking features. International Journal of Intelligent Robotics and Applications, pp.  1–13, 2024.
  13. Learning to estimate hidden motions with global motion aggregation. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  9772–9781, 2021.
  14. End-to-end recovery of human shape and pose. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  7122–7131, 2018.
  15. Learning 3d human dynamics from video. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  5614–5623, 2019.
  16. Vibe: Video inference for human body pose and shape estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  5253–5263, 2020.
  17. Pare: Part attention regressor for 3d human body estimation. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  11127–11137, 2021.
  18. Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  2252–2261, 2019.
  19. Hybrik: A hybrid analytical-neural inverse kinematics solution for 3d human pose and shape estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  3383–3393, 2021.
  20. Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision. arXiv preprint arXiv:2312.16256, 2023.
  21. Ntu rgb+d 120: A large-scale benchmark for 3d human activity understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(10):2684–2701, 2020.
  22. A graph attention spatio-temporal convolutional network for 3d human pose estimation in video. In 2021 IEEE international conference on robotics and automation (ICRA), pp.  3374–3380. IEEE, 2021.
  23. Smpl: A skinned multi-person linear model. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pp.  851–866. 2023.
  24. Transflow: Transformer as flow learner. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  18063–18073, 2023.
  25. 3d human motion estimation via motion compression and refinement. In Proceedings of the Asian Conference on Computer Vision, 2020.
  26. Learning to estimate 3d human pose and shape from a single color image. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  459–468, 2018.
  27. Motion2language, unsupervised learning of synchronized semantic motion segmentation. Neural Computing and Applications, 36(8):4401–4420, 2024.
  28. Frankmocap: Fast monocular 3d hand and body motion capture by regression and integration. arXiv preprint arXiv:2008.08324, 2020.
  29. Frankmocap: A monocular 3d whole-body pose estimation system via regression and integration. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  1749–1759, 2021.
  30. Ntu rgb+d: A large scale dataset for 3d human activity analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  1010–1019, 2016.
  31. Videoflow: Exploiting temporal cues for multi-frame optical flow estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  12469–12480, 2023.
  32. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  33. Real-world anomaly detection in surveillance videos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  6479–6488, 2018.
  34. Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  8934–8943, 2018.
  35. Posenet3d: Learning temporally consistent 3d human pose via knowledge distillation. In 2020 International Conference on 3D Vision (3DV), pp.  311–321, 2020. doi: 10.1109/3DV50981.2020.00041.
  36. 3d human pose estimation via intuitive physics. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  4713–4725, 2023.
  37. Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
  38. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  39. Recovering accurate 3d human pose in the wild using imus and a moving camera. In European Conference on Computer Vision (ECCV), sep 2018.
  40. Deep unsupervised 3d sfm face reconstruction based on massive landmark bundle adjustment. In Proceedings of the 29th ACM International Conference on Multimedia, pp.  1350–1358, 2021.
  41. Probabilistic monocular 3d human pose estimation with normalizing flows. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  11199–11208, 2021.
  42. Physics-based human motion estimation and synthesis from videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  11532–11541, 2021.
  43. Deep kinematics analysis for monocular 3d human pose estimation. In Proceedings of the IEEE/CVF Conference on computer vision and Pattern recognition, pp.  899–908, 2020.
  44. Glamr: Global occlusion-aware human mesh recovery with dynamic cameras. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  11038–11049, 2022.
  45. Motionbert: A unified perspective on learning human motion representations. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  15085–15099, 2023.

Summary

We haven't generated a summary for this paper yet.