Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

STRIDE: Single-video based Temporally Continuous Occlusion-Robust 3D Pose Estimation (2312.16221v4)

Published 24 Dec 2023 in cs.CV

Abstract: The capability to accurately estimate 3D human poses is crucial for diverse fields such as action recognition, gait recognition, and virtual/augmented reality. However, a persistent and significant challenge within this field is the accurate prediction of human poses under conditions of severe occlusion. Traditional image-based estimators struggle with heavy occlusions due to a lack of temporal context, resulting in inconsistent predictions. While video-based models benefit from processing temporal data, they encounter limitations when faced with prolonged occlusions that extend over multiple frames. This challenge arises because these models struggle to generalize beyond their training datasets, and the variety of occlusions is hard to capture in the training data. Addressing these challenges, we propose STRIDE (Single-video based TempoRally contInuous Occlusion-Robust 3D Pose Estimation), a novel Test-Time Training (TTT) approach to fit a human motion prior for each video. This approach specifically handles occlusions that were not encountered during the model's training. By employing STRIDE, we can refine a sequence of noisy initial pose estimates into accurate, temporally coherent poses during test time, effectively overcoming the limitations of prior methods. Our framework demonstrates flexibility by being model-agnostic, allowing us to use any off-the-shelf 3D pose estimation method for improving robustness and temporal consistency. We validate STRIDE's efficacy through comprehensive experiments on challenging datasets like Occluded Human3.6M, Human3.6M, and OCMotion, where it not only outperforms existing single-image and video-based pose estimation models but also showcases superior handling of substantial occlusions, achieving fast, robust, accurate, and temporally consistent 3D pose estimates. Code is made publicly available at https://github.com/take2rohit/stride

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. 3d human body pose estimation in virtual reality: A survey. In 2022 13th International Conference on Information and Communication Technology Convergence (ICTC), pages 624–628, 2022.
  2. Exploiting temporal context for 3d human pose estimation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3395–3404, 2019.
  3. Bedlam: A synthetic dataset of bodies exhibiting detailed lifelike animated motion, 2023.
  4. Unsupervised 3d pose estimation with geometric self-supervision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5714–5724, 2019.
  5. Occlusion-aware networks for 3d human pose estimation in video. In Proceedings of the IEEE/CVF international conference on computer vision, pages 723–732, 2019.
  6. Gfpose: Learning 3d human pose prior with gradient fields, 2022.
  7. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
  8. Out-of-domain human mesh reconstruction via dynamic bilevel online adaptation, 2021a.
  9. Bilevel online adaptation for out-of-domain human mesh reconstruction, 2021b.
  10. Mir Rayat Imtiaz Hossain and James J. Little. Exploiting Temporal Information for 3D Human Pose Estimation, page 69–86. Springer International Publishing, 2018.
  11. Occluded human body capture with self-supervised spatial-temporal motion prior. arXiv preprint arXiv:2207.05375, 2022a.
  12. Object-occluded human shape and pose estimation with probabilistic latent consistency. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022b.
  13. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7):1325–1339, 2014.
  14. End-to-end recovery of human shape and pose, 2018.
  15. 3d human pose lifting with grid convolution. arXiv preprint arXiv:2302.08760, 2023.
  16. Learning latent representations of 3d human pose with deep neural networks. International Journal of Computer Vision, 126, 2018.
  17. Vibe: Video inference for human body pose and shape estimation. In CVPR, 2020.
  18. Pare: Part attention regressor for 3d human body estimation. In ICCV, 2021.
  19. Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In ICCV, 2019.
  20. Human action recognition and prediction: A survey. International Journal of Computer Vision, 130(5):1366–1401, 2022.
  21. Kinematic-structure-preserved representation for unsupervised 3d human pose estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11312–11319, 2020.
  22. Non-local latent relation distillation for self-adaptive 3d human pose estimation. Advances in Neural Information Processing Systems, 34:158–171, 2021.
  23. Uncertainty-aware adaptation for self-supervised 3d human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20448–20459, 2022.
  24. Propagating lstm: 3d pose estimation based on joint interdependency. In Computer Vision – ECCV 2018, pages 123–141, Cham, 2018. Springer International Publishing.
  25. Deep video prior for video consistency and propagation, 2022.
  26. Cliff: Carrying location information in full frames into human pose and shape estimation, 2022.
  27. 3d human motion estimation via motion compression and refinement. In ACCV, 2020.
  28. A simple yet effective baseline for 3d human pose estimation. In Proceedings of the IEEE international conference on computer vision, pages 2640–2649, 2017.
  29. Monocular 3d human pose estimation in the wild using improved cnn supervision. In 2017 international conference on 3D vision (3DV), pages 506–516. IEEE, 2017a.
  30. Monocular 3d human pose estimation in the wild using improved cnn supervision. In 3D Vision (3DV), 2017 Fifth International Conference on. IEEE, 2017b.
  31. Francesc Moreno-Noguer. 3d human pose estimation from a single image via distance matrix regression, 2016.
  32. Stacked hourglass networks for human pose estimation, 2016.
  33. Associative embedding: End-to-end learning for joint detection and grouping, 2017.
  34. Monocular 3d human pose estimation by predicting depth on joints. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 3467–3475. IEEE, 2017.
  35. 3d human pose estimation in video with temporal convolutions and semi-supervised training. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7753–7762, 2019.
  36. Monocular image 3d human pose estimation under self-occlusion. In 2013 IEEE International Conference on Computer Vision, pages 1888–1895, 2013.
  37. Learning monocular 3d human pose estimation from multi-view images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8437–8446, 2018.
  38. The pii problem: Privacy and a new concept of personally identifiable information. NYUL rev., 86:1814, 2011.
  39. Monocular, one-stage, regression of multiple 3d people. In ICCV, 2021.
  40. Attention is all you need, 2023.
  41. Recovering accurate 3d human pose in the wild using imus and a moving camera. In European Conference on Computer Vision (ECCV), 2018.
  42. The pose knows: Video forecasting by generating pose futures, 2017.
  43. Deep kinematics analysis for monocular 3d human pose estimation. In Proceedings of the IEEE/CVF Conference on computer vision and Pattern recognition, pages 899–908, 2020.
  44. Monocular 3d pose estimation via pose grammar and data augmentation. IEEE transactions on pattern analysis and machine intelligence, 44(10):6327–6344, 2021.
  45. Mixste: Seq2seq mixed spatio-temporal encoder for 3d human pose estimation in video. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13232–13242, 2022.
  46. Object-occluded human shape and pose estimation from a single color image. In CVPR, 2020.
  47. Poseformerv2: Exploring frequency domain for efficient and robust 3d human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8877–8886, 2023.
  48. 3d human pose estimation with spatial and temporal transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11656–11665, 2021.
  49. Deep learning-based human pose estimation: A survey. ACM Comput. Surv., 56(1), 2023.
  50. Sparseness meets deepness: 3d human pose estimation from monocular video. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4966–4975, 2016.
  51. Sharc: Shape and appearance recognition for person identification in-the-wild, 2023a.
  52. Motionbert: A unified perspective on learning human motion representations. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15085–15099, 2023b.
  53. Decanus to legatus: Synthetic training for 2d-3d human pose lifting. In Proceedings of the Asian Conference on Computer Vision, pages 2848–2865, 2022.

Summary

We haven't generated a summary for this paper yet.