Convolutional Mesh Regression for Single-Image Human Shape Reconstruction (1905.03244v1)

Published 8 May 2019 in cs.CV

Abstract: This paper addresses the problem of 3D human pose and shape estimation from a single image. Previous approaches consider a parametric model of the human body, SMPL, and attempt to regress the model parameters that give rise to a mesh consistent with image evidence. This parameter regression has been a very challenging task, with model-based approaches underperforming compared to nonparametric solutions in terms of pose estimation. In our work, we propose to relax this heavy reliance on the model's parameter space. We still retain the topology of the SMPL template mesh, but instead of predicting model parameters, we directly regress the 3D location of the mesh vertices. This is a heavy task for a typical network, but our key insight is that the regression becomes significantly easier using a Graph-CNN. This architecture allows us to explicitly encode the template mesh structure within the network and leverage the spatial locality the mesh has to offer. Image-based features are attached to the mesh vertices and the Graph-CNN is responsible to process them on the mesh structure, while the regression target for each vertex is its 3D location. Having recovered the complete 3D geometry of the mesh, if we still require a specific model parametrization, this can be reliably regressed from the vertices locations. We demonstrate the flexibility and the effectiveness of our proposed graph-based mesh regression by attaching different types of features on the mesh vertices. In all cases, we outperform the comparable baselines relying on model parameter regression, while we also achieve state-of-the-art results among model-based pose estimation approaches.

Citations (503)

View on Semantic Scholar

Summary

The paper's main contribution is a framework that directly regresses 3D human mesh vertices using Graph-CNNs, bypassing limitations of SMPL parameter models.
The study demonstrates that the proposed method outperforms state-of-the-art techniques with improvements in MPJPE and Reconstruction Error on Human3.6M and LSP datasets.
The approach is versatile, supporting various input modalities like RGB images, part segmentation, and dense correspondences to enhance detailed human modeling.

Convolutional Mesh Regression for Single-Image Human Shape Reconstruction

The paper "Convolutional Mesh Regression for Single-Image Human Shape Reconstruction" introduces a novel approach to 3D human pose and shape estimation from a single image by employing convolutional mesh regression rather than relying solely on parametric models like SMPL. This paper addresses limitations found in previous model-based methods, which often underperform when compared to nonparametric approaches in pose estimation tasks.

Key Contributions

1. Regressive Target Reformation

The paper proposes shifting from predicting model parameters directly to regressing 3D vertex locations of the human mesh. This offers a more flexible approach which circumvents issues related to the complexity of handling 3D rotations in the parameter space of models like SMPL.

2. Graph-Convolutional Neural Networks (Graph-CNNs)

A core innovation is the use of Graph-CNNs to facilitate the mesh regression process. Graph-CNNs efficiently encode the mesh structure by leveraging spatial locality. This novel use of Graph-CNNs enables better handling of the high dimensionality involved in direct 3D vertex prediction, demonstrating substantial improvements over using plain fully connected layers.

3. Versatility with Input Representations

The framework presented is highly adaptable, capable of processing various types of input features, such as RGB pixels, part segmentation, and dense correspondences. This flexibility is shown through superior performance across different input modalities when compared to traditional parameter-regressing baselines.

Strong Numerical Results

The experiments conducted across Human3.6M and LSP datasets highlight the advantages of the proposed Graph-CNN method. In Human3.6M Protocol 2, the mesh regression method outperforms existing state-of-the-art methods in terms of MPJPE and Reconstruction Error metrics, confirming its efficacy in 3D pose estimation.

Implications and Future Work

The implementation of Graph-CNNs for this task not only achieves competitive, state-of-the-art results but also sets a foundation for future exploration into nonparametric versus parametric modeling in human pose estimation. As AI and computer vision systems evolve, this versatile mesh regression method can be extended to encompass more detailed aspects of human modeling, such as expressions and fine details like clothing and hair, which are not adequately captured by existing parametric models.

In conclusion, the paper presents a comprehensive framework that not only enhances 3D pose and shape reconstruction tasks but also paves the way for further work in improving model details and accuracy. The strategic use of Graph-CNNs as detailed in this research provides a promising avenue for advancing human shape modeling and related applications.

PDF Markdown