Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Score-Guided Diffusion for 3D Human Recovery (2403.09623v1)

Published 14 Mar 2024 in cs.CV

Abstract: We present Score-Guided Human Mesh Recovery (ScoreHMR), an approach for solving inverse problems for 3D human pose and shape reconstruction. These inverse problems involve fitting a human body model to image observations, traditionally solved through optimization techniques. ScoreHMR mimics model fitting approaches, but alignment with the image observation is achieved through score guidance in the latent space of a diffusion model. The diffusion model is trained to capture the conditional distribution of the human model parameters given an input image. By guiding its denoising process with a task-specific score, ScoreHMR effectively solves inverse problems for various applications without the need for retraining the task-agnostic diffusion model. We evaluate our approach on three settings/applications. These are: (i) single-frame model fitting; (ii) reconstruction from multiple uncalibrated views; (iii) reconstructing humans in video sequences. ScoreHMR consistently outperforms all optimization baselines on popular benchmarks across all settings. We make our code and models available at the https://statho.github.io/ScoreHMR.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. 2d human pose estimation: New benchmark and state of the art analysis. In CVPR, 2014.
  2. Posetrack: A benchmark for human pose estimation and tracking. In CVPR, 2018.
  3. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  4. Keep it smpl: Automatic estimation of 3d human pose and shape from a single image. In ECCV, 2016.
  5. Openpose: Realtime multi-person 2d pose estimation using part affinity fields. IEEE TPAMI, 2019.
  6. Mhentropy: Entropy meets multiple hypotheses for pose and shape recovery. In ICCV, 2023.
  7. Cross-attention of disentangled modalities for 3d human mesh recovery with transformers. In ECCV, 2022.
  8. Learning to fit morphable models. In ECCV, 2022.
  9. Improving diffusion models for inverse problems using manifold constraints. In NeurIPS, 2022.
  10. Diffusion posterior sampling for general noisy inverse problems. In ICLR, 2023.
  11. Adversarial parametric pose prior. In CVPR, 2022.
  12. Diffusion models beat gans on image synthesis. In NeurIPS, 2021.
  13. Learning analytical posterior probability for human mesh recovery. In CVPR, 2023.
  14. Hierarchical kinematic human mesh recovery. In ECCV, 2020.
  15. Humans in 4d: Reconstructing and tracking humans with transformers. In ICCV, 2023.
  16. Holopose: Holistic 3d human reconstruction in-the-wild. In CVPR, 2019.
  17. Svdiff: Compact parameter space for diffusion fine-tuning. In ICCV, 2023.
  18. Proxedit: Improving tuning-free real image editing with proximal guidance. In WACV, 2024.
  19. Deep residual learning for image recognition. In CVPR, 2016.
  20. Denoising diffusion probabilistic models. In NeurIPS, 2020.
  21. Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE TPAMI, 2014.
  22. Exemplar fine-tuning for 3d human model fitting towards in-the-wild 3d human pose estimation. In 3DV, 2021.
  23. End-to-end recovery of human shape and pose. In CVPR, 2018.
  24. Learning 3d human dynamics from video. In CVPR, 2019.
  25. Emdb: The electromagnetic database of global 3d human pose and shape in the wild. In ICCV, 2023.
  26. Adam: A method for stochastic optimization. In ICLR, 2015.
  27. Auto-encoding variational bayes. In ICLR, 2014.
  28. Vibe: Video inference for human body pose and shape estimation. In CVPR, 2020.
  29. Pare: Part attention regressor for 3d human body estimation. In ICCV, 2021.
  30. Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In ICCV, 2019a.
  31. Convolutional mesh regression for single-image human shape reconstruction. In CVPR, 2019b.
  32. Probabilistic modeling for human mesh recovery. In ICCV, 2021.
  33. Unite the people: Closing the loop between 3d and 2d human representations. In CVPR, 2017.
  34. Smply benchmarking 3d human pose estimation in the wild. In 3DV, 2020.
  35. Hybrik: A hybrid analytical-neural inverse kinematics solution for 3d human pose and shape estimation. In CVPR, 2021.
  36. Niki: Neural inverse kinematics with invertible neural networks for 3d human pose and shape estimation. In CVPR, 2023.
  37. Learning the depths of moving people by watching frozen people. In CVPR, 2019.
  38. Cliff: Carrying location information in full frames into human pose and shape estimation. In ECCV, 2022.
  39. End-to-end human pose and mesh reconstruction with transformers. In CVPR, 2021a.
  40. Mesh graphormer. In CVPR, 2021b.
  41. Microsoft coco: Common objects in context. In ECCV, 2014.
  42. Smpl: A skinned multi-person linear model. ACM TOG, 2015.
  43. Monocular 3d human pose estimation in the wild using improved cnn supervision. In 3DV, 2017.
  44. Improved denoising diffusion probabilistic models. In ICML, 2021.
  45. Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019.
  46. Expressive body capture: 3d hands, face, and body from a single image. In CVPR, 2019.
  47. Single motion diffusion. In ICLR, 2024.
  48. Humor: 3d human motion model for robust pose estimation. In ICCV, 2021.
  49. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  50. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, 2023.
  51. Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, 2022.
  52. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, 2015.
  53. Human body model fitting by learned gradient descent. In ECCV, 2020.
  54. Denoising diffusion implicit models. In ICLR, 2021a.
  55. Pseudoinverse-guided diffusion models for inverse problems. In ICLR, 2023.
  56. Score-based generative modeling through stochastic differential equations. In ICLR, 2021b.
  57. Learning articulated shape with keypoint pseudo-labels from web images. In CVPR, 2023.
  58. Human motion diffusion model. In ICLR, 2023.
  59. Pose-ndf: Modeling human pose manifolds with neural distance fields. In ECCV, 2022.
  60. Recovering accurate 3d human pose in the wild using imus and a moving camera. In ECCV, 2018.
  61. Refit: Recurrent fitting network for 3d human recovery. In ICCV, 2023.
  62. Magicpony: Learning articulated 3d animals in the wild. In CVPR, 2023.
  63. Monocular total capture: Posing face, body, and hands in the wild. In CVPR, 2019.
  64. Ghum & ghuml: Generative 3d human shape and articulated pose models. In CVPR, 2020.
  65. Banmo: Building animatable 3d neural models from many casual videos. In CVPR, 2022.
  66. Decoupling human and camera motion from videos in the wild. In CVPR, 2023.
  67. Physdiff: Physics-guided human motion diffusion model. In CVPR, 2023.
  68. Pymaf: 3d human pose and shape regression with pyramidal mesh alignment feedback loop. In CVPR, 2021.
  69. Avid: Any-length video inpainting with diffusion model. In CVPR, 2024.
  70. On the continuity of rotation representations in neural networks. In CVPR, 2019.
Citations (7)

Summary

  • The paper introduces ScoreHMR, the first approach that uses score-guided diffusion models to iteratively refine 3D human mesh estimates from monocular images.
  • It leverages a diffusion model trained on conditional distributions of plausible human parameters, consistently outperforming optimization baselines on standard benchmarks.
  • The method integrates additional cues like 2D keypoints for versatile refinement without task-specific retraining, marking a significant step forward in 3D human recovery.

Score-Guided Diffusion Models for 3D Human Recovery

Introduction to Score-Guided Diffusion Models

3D Human Recovery (HMR) from monocular images is a pivotal task in computer vision with extensive applications ranging from animated movie production to surveillance. The advancement in Diffusion Models (DMs) has opened new avenues for addressing inverse problems, traditionally tackled by optimization or regression techniques. In this paper, "Score-Guided Human Mesh Recovery (ScoreHMR)" is introduced, utilizing DMs for the first time for 3D human mesh recovery, paving the way towards solving the inverse problem of fitting a parametric human body model to observed image data.

The Core Approach: ScoreHMR

ScoreHMR harmonizes the generative capabilities of diffusion models with score-based guidance to refine initial estimates of human mesh models. This process leverages a task-agnostic diffusion model trained on capturing the conditional distribution of plausible human model parameters given an input image. Crucially, the method enhances these initial estimates with additional observations (e.g., 2D keypoints or multiple uncalibrated views), through an iterative refinement process guided by a task-specific score in the latent space of a diffusion model.

Achievements and Advancements

ScoreHMR methodically outperforms existing optimization baselines across a range of popular benchmarks, showcasing its superiority in refining initial regression estimates. Numerically, the method advances the state-of-the-art by delivering consistent improvements over all tested datasets in single-frame model fitting settings. Remarkably, it is the only approach that elevates the performance of the leading monocular feed-forward system on challenging poses.

Practical Implications and Theoretical Contributions

From a practical standpoint, ScoreHMR’s ability to iteratively refine initial estimates without the need for retraining the model for specific tasks denotes a significant leap towards adaptable and scalable 3D human recovery solutions. Theoretically, this work demonstrates the effectiveness of leveraging the generative prowess of diffusion models for solving inverse problems in 3D human recovery. Moreover, the introduction of a novel score guidance mechanism, exploiting both the learned distribution of plausible SMPL parameters and observed data, offers a compelling alternative to conventional optimization-based methods.

Speculating on Future Directions

The novel approach of ScoreHMR suggests several promising directions for future research. Expanding its application to more complex and dynamic scenes, such as crowded environments or intricate human interactions, could further cement its utility. Additionally, exploring the integration of more diverse observations (e.g., depth information or temporal consistency in video sequences) within the score-guided diffusion framework could unlock new dimensions of model refinement. Lastly, the adaptability of ScoreHMR to solve other inverse problems beyond human mesh recovery represents an intriguing avenue for extending the application scope of diffusion models in computer vision.

Concluding Remarks

ScoreHMR stands as a pioneering approach, exploiting score-guided diffusion models for the refinement of 3D human mesh recovery from monocular images. Through rigorous evaluation, it has proven to not only enhance existing regression estimates significantly but also to provide a versatile framework capable of adapting to various observation modalities without task-specific retraining. As such, it signifies a substantial step forward in the quest for more accurate, efficient, and adaptable 3D human recovery solutions, setting a new benchmark for future endeavors in this domain.

Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com