Score-Guided Diffusion for 3D Human Recovery (2403.09623v1)
Abstract: We present Score-Guided Human Mesh Recovery (ScoreHMR), an approach for solving inverse problems for 3D human pose and shape reconstruction. These inverse problems involve fitting a human body model to image observations, traditionally solved through optimization techniques. ScoreHMR mimics model fitting approaches, but alignment with the image observation is achieved through score guidance in the latent space of a diffusion model. The diffusion model is trained to capture the conditional distribution of the human model parameters given an input image. By guiding its denoising process with a task-specific score, ScoreHMR effectively solves inverse problems for various applications without the need for retraining the task-agnostic diffusion model. We evaluate our approach on three settings/applications. These are: (i) single-frame model fitting; (ii) reconstruction from multiple uncalibrated views; (iii) reconstructing humans in video sequences. ScoreHMR consistently outperforms all optimization baselines on popular benchmarks across all settings. We make our code and models available at the https://statho.github.io/ScoreHMR.
- 2d human pose estimation: New benchmark and state of the art analysis. In CVPR, 2014.
- Posetrack: A benchmark for human pose estimation and tracking. In CVPR, 2018.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- Keep it smpl: Automatic estimation of 3d human pose and shape from a single image. In ECCV, 2016.
- Openpose: Realtime multi-person 2d pose estimation using part affinity fields. IEEE TPAMI, 2019.
- Mhentropy: Entropy meets multiple hypotheses for pose and shape recovery. In ICCV, 2023.
- Cross-attention of disentangled modalities for 3d human mesh recovery with transformers. In ECCV, 2022.
- Learning to fit morphable models. In ECCV, 2022.
- Improving diffusion models for inverse problems using manifold constraints. In NeurIPS, 2022.
- Diffusion posterior sampling for general noisy inverse problems. In ICLR, 2023.
- Adversarial parametric pose prior. In CVPR, 2022.
- Diffusion models beat gans on image synthesis. In NeurIPS, 2021.
- Learning analytical posterior probability for human mesh recovery. In CVPR, 2023.
- Hierarchical kinematic human mesh recovery. In ECCV, 2020.
- Humans in 4d: Reconstructing and tracking humans with transformers. In ICCV, 2023.
- Holopose: Holistic 3d human reconstruction in-the-wild. In CVPR, 2019.
- Svdiff: Compact parameter space for diffusion fine-tuning. In ICCV, 2023.
- Proxedit: Improving tuning-free real image editing with proximal guidance. In WACV, 2024.
- Deep residual learning for image recognition. In CVPR, 2016.
- Denoising diffusion probabilistic models. In NeurIPS, 2020.
- Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE TPAMI, 2014.
- Exemplar fine-tuning for 3d human model fitting towards in-the-wild 3d human pose estimation. In 3DV, 2021.
- End-to-end recovery of human shape and pose. In CVPR, 2018.
- Learning 3d human dynamics from video. In CVPR, 2019.
- Emdb: The electromagnetic database of global 3d human pose and shape in the wild. In ICCV, 2023.
- Adam: A method for stochastic optimization. In ICLR, 2015.
- Auto-encoding variational bayes. In ICLR, 2014.
- Vibe: Video inference for human body pose and shape estimation. In CVPR, 2020.
- Pare: Part attention regressor for 3d human body estimation. In ICCV, 2021.
- Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In ICCV, 2019a.
- Convolutional mesh regression for single-image human shape reconstruction. In CVPR, 2019b.
- Probabilistic modeling for human mesh recovery. In ICCV, 2021.
- Unite the people: Closing the loop between 3d and 2d human representations. In CVPR, 2017.
- Smply benchmarking 3d human pose estimation in the wild. In 3DV, 2020.
- Hybrik: A hybrid analytical-neural inverse kinematics solution for 3d human pose and shape estimation. In CVPR, 2021.
- Niki: Neural inverse kinematics with invertible neural networks for 3d human pose and shape estimation. In CVPR, 2023.
- Learning the depths of moving people by watching frozen people. In CVPR, 2019.
- Cliff: Carrying location information in full frames into human pose and shape estimation. In ECCV, 2022.
- End-to-end human pose and mesh reconstruction with transformers. In CVPR, 2021a.
- Mesh graphormer. In CVPR, 2021b.
- Microsoft coco: Common objects in context. In ECCV, 2014.
- Smpl: A skinned multi-person linear model. ACM TOG, 2015.
- Monocular 3d human pose estimation in the wild using improved cnn supervision. In 3DV, 2017.
- Improved denoising diffusion probabilistic models. In ICML, 2021.
- Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019.
- Expressive body capture: 3d hands, face, and body from a single image. In CVPR, 2019.
- Single motion diffusion. In ICLR, 2024.
- Humor: 3d human motion model for robust pose estimation. In ICCV, 2021.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, 2023.
- Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, 2022.
- Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, 2015.
- Human body model fitting by learned gradient descent. In ECCV, 2020.
- Denoising diffusion implicit models. In ICLR, 2021a.
- Pseudoinverse-guided diffusion models for inverse problems. In ICLR, 2023.
- Score-based generative modeling through stochastic differential equations. In ICLR, 2021b.
- Learning articulated shape with keypoint pseudo-labels from web images. In CVPR, 2023.
- Human motion diffusion model. In ICLR, 2023.
- Pose-ndf: Modeling human pose manifolds with neural distance fields. In ECCV, 2022.
- Recovering accurate 3d human pose in the wild using imus and a moving camera. In ECCV, 2018.
- Refit: Recurrent fitting network for 3d human recovery. In ICCV, 2023.
- Magicpony: Learning articulated 3d animals in the wild. In CVPR, 2023.
- Monocular total capture: Posing face, body, and hands in the wild. In CVPR, 2019.
- Ghum & ghuml: Generative 3d human shape and articulated pose models. In CVPR, 2020.
- Banmo: Building animatable 3d neural models from many casual videos. In CVPR, 2022.
- Decoupling human and camera motion from videos in the wild. In CVPR, 2023.
- Physdiff: Physics-guided human motion diffusion model. In CVPR, 2023.
- Pymaf: 3d human pose and shape regression with pyramidal mesh alignment feedback loop. In CVPR, 2021.
- Avid: Any-length video inpainting with diffusion model. In CVPR, 2024.
- On the continuity of rotation representations in neural networks. In CVPR, 2019.