Score-Guided Diffusion for 3D Human Recovery (2403.09623v1)

Published 14 Mar 2024 in cs.CV

Abstract: We present Score-Guided Human Mesh Recovery (ScoreHMR), an approach for solving inverse problems for 3D human pose and shape reconstruction. These inverse problems involve fitting a human body model to image observations, traditionally solved through optimization techniques. ScoreHMR mimics model fitting approaches, but alignment with the image observation is achieved through score guidance in the latent space of a diffusion model. The diffusion model is trained to capture the conditional distribution of the human model parameters given an input image. By guiding its denoising process with a task-specific score, ScoreHMR effectively solves inverse problems for various applications without the need for retraining the task-agnostic diffusion model. We evaluate our approach on three settings/applications. These are: (i) single-frame model fitting; (ii) reconstruction from multiple uncalibrated views; (iii) reconstructing humans in video sequences. ScoreHMR consistently outperforms all optimization baselines on popular benchmarks across all settings. We make our code and models available at the https://statho.github.io/ScoreHMR.

References (70)

Citations (7)

View on Semantic Scholar

Summary

The paper introduces ScoreHMR, the first approach that uses score-guided diffusion models to iteratively refine 3D human mesh estimates from monocular images.
It leverages a diffusion model trained on conditional distributions of plausible human parameters, consistently outperforming optimization baselines on standard benchmarks.
The method integrates additional cues like 2D keypoints for versatile refinement without task-specific retraining, marking a significant step forward in 3D human recovery.

Score-Guided Diffusion Models for 3D Human Recovery

Introduction to Score-Guided Diffusion Models

3D Human Recovery (HMR) from monocular images is a pivotal task in computer vision with extensive applications ranging from animated movie production to surveillance. The advancement in Diffusion Models (DMs) has opened new avenues for addressing inverse problems, traditionally tackled by optimization or regression techniques. In this paper, "Score-Guided Human Mesh Recovery (ScoreHMR)" is introduced, utilizing DMs for the first time for 3D human mesh recovery, paving the way towards solving the inverse problem of fitting a parametric human body model to observed image data.

The Core Approach: ScoreHMR

ScoreHMR harmonizes the generative capabilities of diffusion models with score-based guidance to refine initial estimates of human mesh models. This process leverages a task-agnostic diffusion model trained on capturing the conditional distribution of plausible human model parameters given an input image. Crucially, the method enhances these initial estimates with additional observations (e.g., 2D keypoints or multiple uncalibrated views), through an iterative refinement process guided by a task-specific score in the latent space of a diffusion model.

Achievements and Advancements

ScoreHMR methodically outperforms existing optimization baselines across a range of popular benchmarks, showcasing its superiority in refining initial regression estimates. Numerically, the method advances the state-of-the-art by delivering consistent improvements over all tested datasets in single-frame model fitting settings. Remarkably, it is the only approach that elevates the performance of the leading monocular feed-forward system on challenging poses.

Practical Implications and Theoretical Contributions

From a practical standpoint, ScoreHMR’s ability to iteratively refine initial estimates without the need for retraining the model for specific tasks denotes a significant leap towards adaptable and scalable 3D human recovery solutions. Theoretically, this work demonstrates the effectiveness of leveraging the generative prowess of diffusion models for solving inverse problems in 3D human recovery. Moreover, the introduction of a novel score guidance mechanism, exploiting both the learned distribution of plausible SMPL parameters and observed data, offers a compelling alternative to conventional optimization-based methods.

Speculating on Future Directions

The novel approach of ScoreHMR suggests several promising directions for future research. Expanding its application to more complex and dynamic scenes, such as crowded environments or intricate human interactions, could further cement its utility. Additionally, exploring the integration of more diverse observations (e.g., depth information or temporal consistency in video sequences) within the score-guided diffusion framework could unlock new dimensions of model refinement. Lastly, the adaptability of ScoreHMR to solve other inverse problems beyond human mesh recovery represents an intriguing avenue for extending the application scope of diffusion models in computer vision.

Concluding Remarks

ScoreHMR stands as a pioneering approach, exploiting score-guided diffusion models for the refinement of 3D human mesh recovery from monocular images. Through rigorous evaluation, it has proven to not only enhance existing regression estimates significantly but also to provide a versatile framework capable of adapting to various observation modalities without task-specific retraining. As such, it signifies a substantial step forward in the quest for more accurate, efficient, and adaptable 3D human recovery solutions, setting a new benchmark for future endeavors in this domain.

Related Papers

GitHub

Tweets

https://twitter.com/taziku_co/status/1771746821636850096

https://twitter.com/camenduru/status/1768807950921908722

https://twitter.com/statho_/status/1768784091569074212

https://twitter.com/fly51fly/status/1769134868989100323

https://twitter.com/arxivsanitybot/status/1769182130318561746

YouTube

Show All Videos

HackerNews

ScoreHMR: Score-Guided Diffusion for 3D Human Recovery (1 point, 0 comments)