Representation learning of vertex heatmaps for 3D human mesh reconstruction from multi-view images (2306.16615v1)
Abstract: This study addresses the problem of 3D human mesh reconstruction from multi-view images. Recently, approaches that directly estimate the skinned multi-person linear model (SMPL)-based human mesh vertices based on volumetric heatmap representation from input images have shown good performance. We show that representation learning of vertex heatmaps using an autoencoder helps improve the performance of such approaches. Vertex heatmap autoencoder (VHA) learns the manifold of plausible human meshes in the form of latent codes using AMASS, which is a large-scale motion capture dataset. Body code predictor (BCP) utilizes the learned body prior from VHA for human mesh reconstruction from multi-view images through latent code-based supervision and transfer of pretrained weights. According to experiments on Human3.6M and LightStage datasets, the proposed method outperforms previous methods and achieves state-of-the-art human mesh reconstruction performance.
- “End-to-end recovery of human shape and pose,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- “Multi-view human pose and shape estimation using learnable volumetric aggregation,” arXiv preprint arXiv:2011.13427, 2020.
- “Learning to estimate robust 3d human mesh from in-the-wild crowded scenes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- “Expressive body capture: 3d hands, face, and body from a single image,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- “Keep it smpl: Automatic estimation of 3d human pose and shape from a single image,” in Proceedings of the European Conference on Computer Vision (ECCV), Oct. 2016.
- “Learnable human mesh triangulation for 3d human pose and shape estimation,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), January 2023, pp. 2850–2859.
- “Light-weight multi-person total capture using sparse multi-view cameras,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2021.
- “Smpl: A skinned multi-person linear model,” ACM Transactions on Graphics (TOG), vol. 34, no. 6, pp. 248:1–248:16, Oct. 2015.
- “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.
- Deep Learning, MIT Press, 2016.
- “Compressed volumetric heatmaps for multi-person 3d pose estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- “Amass: Archive of motion capture as surface shapes,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2019.
- “Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 36, no. 7, pp. 1325–1339, July 2014.
- “Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- “Integral human pose regression,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018.
- “Learnable triangulation of human pose,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019.
- “Generating 3d faces using convolutional mesh autoencoders,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 725–741.
- “Adam: A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), 2015.
- “MoSh: Motion and shape capture from sparse markers,” ACM Transactions on Graphics (TOG), vol. 33, no. 6, pp. 220:1–220:13, Nov. 2014.
- “Rotation averaging,” International Journal of Computer Vision (IJCV), vol. 103, no. 3, pp. 267–305, 2013.