PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop (2103.16507v4)

Published 30 Mar 2021 in cs.CV

Abstract: Regression-based methods have recently shown promising results in reconstructing human meshes from monocular images. By directly mapping raw pixels to model parameters, these methods can produce parametric models in a feed-forward manner via neural networks. However, minor deviation in parameters may lead to noticeable misalignment between the estimated meshes and image evidences. To address this issue, we propose a Pyramidal Mesh Alignment Feedback (PyMAF) loop to leverage a feature pyramid and rectify the predicted parameters explicitly based on the mesh-image alignment status in our deep regressor. In PyMAF, given the currently predicted parameters, mesh-aligned evidences will be extracted from finer-resolution features accordingly and fed back for parameter rectification. To reduce noise and enhance the reliability of these evidences, an auxiliary pixel-wise supervision is imposed on the feature encoder, which provides mesh-image correspondence guidance for our network to preserve the most related information in spatial features. The efficacy of our approach is validated on several benchmarks, including Human3.6M, 3DPW, LSP, and COCO, where experimental results show that our approach consistently improves the mesh-image alignment of the reconstruction. The project page with code and video results can be found at https://hongwenzhang.github.io/pymaf.

Citations (271)

View on Semantic Scholar

Summary

The paper introduces PyMAF, a feedback-based approach that corrects mesh-image misalignments in 3D human pose and shape regression.
It employs a multi-scale feature pyramid and auxiliary pixel-wise supervision to iteratively refine mesh predictions.
Results on Human3.6M and 3DPW demonstrate significant improvements in MPJPE and PA-MPJPE over baseline methods.

PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop

Introduction

The paper presents a novel approach to 3D human pose and shape regression using a method called Pyramidal Mesh Alignment Feedback (PyMAF). Traditional regression-based methods link raw pixels to model parameters to generate parametric models. These models, while promising, often encounter discrepancies due to imperfect mesh-image alignment when directly regressing parameters from images.

Methodology

PyMAF introduces a feedback loop that corrects these discrepancies by leveraging multi-scale spatial features. The approach uses a feature pyramid to progressively refine mesh predictions by extracting mesh-aligned evidences. These evidences explicitly inform parameter rectifications, enhancing overall mesh-image alignment significantly. The method is realized through these innovative steps:

Feature Pyramid: The network generates spatial features at different resolutions, allowing access to both coarse and fine-grained information necessary for accurately predicting model parameters.
Mesh Alignment Feedback: The feedback loop incorporates mesh-aligned features, derived from spatial features and the existing positional estimate of the mesh, to iteratively correct the predicted parameters.
Auxiliary Pixel-wise Supervision: An auxiliary task on the spatial features improves reliability by ensuring the encoder preserves critical alignment details.

Results

Extensive experiments across multiple datasets, including Human3.6M and 3DPW, highlight the effectiveness of PyMAF. Notably:

On 3DPW, PyMAF achieves an MPJPE of 92.8 mm and PA-MPJPE of 58.9 mm, underscoring the significant improvements over baseline methods.
For the Human3.6M dataset, the method records an MPJPE of 57.7 mm, showcasing its robust alignment capabilities.
PyMAF also excels in 2D tasks with improved segmentation accuracy and f1 scores on LSP, indicating better overall mesh-image alignment compared to prior regression-based approaches.

Implications and Future Directions

The PyMAF framework represents a significant advancement in addressing mesh-image misalignment in human pose and shape regression. This method contributes not only to theoretical understanding but also to practical applications, specifically in areas requiring accurate human model reconstructions.

Future developments could explore enhancements to further mitigate depth ambiguity. Additionally, integrating PyMAF with other recent advancements could lead to more precise pseudo-ground-truth generation, broadening its application scope and improving generalization capabilities.

In summary, PyMAF provides a nuanced approach to handling mesh-image alignments in regression-based human mesh recovery, setting the stage for future innovations in this domain.

PDF Markdown