3D Human Pose Analysis via Diffusion Synthesis (2401.08930v1)
Abstract: Diffusion models have demonstrated remarkable success in generative modeling. In this paper, we propose PADS (Pose Analysis by Diffusion Synthesis), a novel framework designed to address various challenges in 3D human pose analysis through a unified pipeline. Central to PADS are two distinctive strategies: i) learning a task-agnostic pose prior using a diffusion synthesis process to effectively capture the kinematic constraints in human pose data, and ii) unifying multiple pose analysis tasks like estimation, completion, denoising, etc, as instances of inverse problems. The learned pose prior will be treated as a regularization imposing on task-specific constraints, guiding the optimization process through a series of conditional denoising steps. PADS represents the first diffusion-based framework for tackling general 3D human pose analysis within the inverse problem framework. Its performance has been validated on different benchmarks, signaling the adaptability and robustness of this pipeline.
- Keep it smpl: Automatic estimation of 3d human pose and shape from a single image. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14, pages 561–578. Springer, 2016.
- Unsupervised 3d pose estimation with geometric self-supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5714–5724, 2019.
- Diffusiondet: Diffusion model for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19830–19843, 2023.
- Ilvr: Conditioning method for denoising diffusion probabilistic models. arXiv preprint arXiv:2108.02938, 2021.
- Diffusion posterior sampling for general noisy inverse problems. arXiv preprint arXiv:2209.14687, 2022.
- Improving diffusion models for inverse problems using manifold constraints. Advances in Neural Information Processing Systems, 35:25683–25696, 2022.
- Gfpose: Learning 3d human pose prior with gradient fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4800–4810, 2023.
- Adversarial parametric pose prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10997–11005, 2022.
- H Hatze. A three-dimensional multivariate model of passive human joint torques and articular boundaries. Clinical Biomechanics, 12(2):128–135, 1997.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE transactions on pattern analysis and machine intelligence, 36(7):1325–1339, 2013.
- Unsupervised 3d pose estimation with non-rigid structure-from-motion modeling. arXiv preprint arXiv:2308.10705, 2023.
- Golfpose: Golf swing analyses with a monocular camera based human pose estimation. In 2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pages 1–6. IEEE, 2022.
- Back to optimization: Diffusion-based zero-shot 3d human pose estimation. arXiv preprint arXiv:2307.03833, 2023.
- End-to-end recovery of human shape and pose. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7122–7131, 2018.
- Denoising diffusion restoration models. Advances in Neural Information Processing Systems, 35:23593–23606, 2022.
- Vibe: Video inference for human body pose and shape estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5253–5263, 2020.
- Mhformer: Multi-hypothesis transformer for 3d human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13147–13156, 2022.
- Diffusion models for image restoration and enhancement–a comprehensive survey. arXiv preprint arXiv:2308.09388, 2023.
- Smpl: A skinned multi-person linear model. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pages 851–866. 2023.
- Follow your pose: Pose-guided text-to-video generation using pose-free videos. arXiv preprint arXiv:2304.01186, 2023.
- A simple yet effective baseline for 3d human pose estimation. In Proceedings of the IEEE international conference on computer vision, pages 2640–2649, 2017.
- Monocular 3d human pose estimation in the wild using improved cnn supervision. In 2017 international conference on 3D vision (3DV), pages 506–516. IEEE, 2017.
- C3dpo: Canonical 3d pose networks for non-rigid structure from motion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7688–7697, 2019.
- 3d human pose estimation in video with temporal convolutions and semi-supervised training. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7753–7762, 2019.
- Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
- Diffusion-based 3d human pose estimation with multi-hypothesis aggregation. arXiv preprint arXiv:2303.11579, 2023.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
- Human body model fitting by learned gradient descent. In European Conference on Computer Vision, pages 744–760. Springer, 2020.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- Pseudoinverse-guided diffusion models for inverse problems. In International Conference on Learning Representations, 2022.
- Applications of pose estimation in human health and performance across the lifespan. Sensors, 21(21):7315, 2021.
- Human motion diffusion model. arXiv preprint arXiv:2209.14916, 2022.
- Elepose: Unsupervised 3d human pose estimation by predicting camera elevation and learning normalizing flows on 2d poses. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6635–6645, 2022.
- Paul: Procrustean autoencoder for unsupervised lifting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 434–443, 2021.
- Photo wake-up: 3d character animation from a single photo. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5908–5917, 2019.
- Towards alleviating the modeling ambiguity of unsupervised monocular 3d human pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8651–8660, 2021.
- Physdiff: Physics-guided human motion diffusion model. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16010–16021, 2023.
- Mhr-net: Multiple-hypothesis reconstruction of non-rigid shapes from 2d views. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part II, pages 1–17. Springer, 2022.
- Mixste: Seq2seq mixed spatio-temporal encoder for 3d human pose estimation in video. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13232–13242, 2022.
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
- Truncated diffusion probabilistic models and diffusion-based adversarial auto-encoders. arXiv preprint arXiv:2202.09671, 2022.