ANIM: Accurate Neural Implicit Model for Human Reconstruction from a single RGB-D image (2403.10357v2)
Abstract: Recent progress in human shape learning, shows that neural implicit models are effective in generating 3D human surfaces from limited number of views, and even from a single RGB image. However, existing monocular approaches still struggle to recover fine geometric details such as face, hands or cloth wrinkles. They are also easily prone to depth ambiguities that result in distorted geometries along the camera optical axis. In this paper, we explore the benefits of incorporating depth observations in the reconstruction process by introducing ANIM, a novel method that reconstructs arbitrary 3D human shapes from single-view RGB-D images with an unprecedented level of accuracy. Our model learns geometric details from both multi-resolution pixel-aligned and voxel-aligned features to leverage depth information and enable spatial relationships, mitigating depth ambiguities. We further enhance the quality of the reconstructed shape by introducing a depth-supervision strategy, which improves the accuracy of the signed distance field estimation of points that lie on the reconstructed surface. Experiments demonstrate that ANIM outperforms state-of-the-art works that use RGB, surface normals, point cloud or RGB-D data as input. In addition, we introduce ANIM-Real, a new multi-modal dataset comprising high-quality scans paired with consumer-grade RGB-D camera, and our protocol to fine-tune ANIM, enabling high-quality reconstruction from real-world human capture.
- Renderpeople. https://renderpeople.com/. Accessed: 2020-07-26.
- Video based reconstruction of 3d people models. In IEEE Conference on Computer Vision and Pattern Recognition, 2018.
- Learning to reconstruct people in clothing from a single rgb camera. In IEEE Conference on Computer Vision and Pattern Recognition, 2019a.
- Tex2shape: Detailed full human body geometry from a single image. In IEEE International Conference on Computer Vision, 2019b.
- Photorealistic monocular 3D reconstruction of humans wearing clothing. In IEEE Conference on Computer Vision and Pattern Recognition, 2022.
- Re-identification with rgb-d sensors. In European Conference on Computer Vision Workshops, 2012.
- Fast winding numbers for soups and clouds. ACM Transactions on Graphics, 2018.
- Combining implicit function learning and parametric models for 3d human reconstruction. In European Conference on Computer Vision, 2020a.
- Combining implicit function learning and parametric models for 3d human reconstruction. In European Conference on Computer Vision, 2020b.
- Keep it SMPL: Automatic estimation of 3d human pose and shape from a single image. In European Conference on Computer Vision, 2016.
- Dynamic surface function networks for clothed human bodies. In IEEE International Conference on Computer Vision, 2021.
- Implicit functions in feature space for 3d shape reconstruction and completion. In IEEE Conference on Computer Vision and Pattern Recognition, 2020.
- Pku-mmd: A large scale benchmark for continuous multi-modal human action understanding. arXiv preprint arXiv:1703.07475, 2017.
- Social activity recognition based on probabilistic merging of skeleton features with proximity priors from rgb-d data. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2016.
- Geometry-aware two-scale pifu representation for human reconstruction. In Advances in Neural Information Processing Systems, 2021.
- Pina: Learning a personalized implicit neural avatar from a single rgb-d video sequence. In IEEE Conference on Computer Vision and Pattern Recognition, 2022.
- Moulding humans: Non-parametric 3d human shape estimation from single images. In IEEE International Conference on Computer Vision, 2019.
- Human activity recognition process using 3-d posture data. IEEE Transactions on Human-Machine Systems, 45(5):586–597, 2014.
- 3d semantic segmentation with submanifold sparse convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition, 2018.
- Vid2avatar: 3d avatar reconstruction from videos in the wild via self-supervised scene decomposition. In IEEE Conference on Computer Vision and Pattern Recognition, 2023.
- What can we learn from depth camera sensor noise? Sensors, 22(14):5448, 2022.
- Geo-pifu: Geometry and pixel aligned implicit functions for single-view human reconstruction. In Annual Conference on Neural Information Processing Systems, 2020.
- Arch++: Animation-ready clothed human reconstruction revisited. In IEEE International Conference on Computer Vision, 2021.
- https://3dmd.com/. 3dmd 4d scanner.
- ARCH: Animatable reconstruction of clothed humans. In IEEE Conference on Computer Vision and Pattern Recognition, 2020.
- Bodymap: Learning full-body dense correspondence map. In IEEE Conference on Computer Vision and Pattern Recognition, 2022.
- Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7):1325–1339, 2014.
- 3d human body reconstruction from a single image via volumetric regression. European Conference of Computer Vision Workshops, 2018.
- End-to-end recovery of human shape and pose. In IEEE Conference on Computer Vision and Pattern Recognition, 2018.
- Vibe: Video inference for human body pose and shape estimation. In IEEE Conference on Computer Vision and Pattern Recognition, 2020.
- Shape completion from a single rgbd image. IEEE Transactions on Visualization and Computer Graphics, 23(7):1809–1822, 2017.
- Monocular real-time volumetric performance capture. In European Conference on Computer Vision, 2020a.
- Sfnet: Clothed human 3d reconstruction via single side-to-front view rgb-d image. In International Conference on Virtual Reality, 2022.
- Robust 3d self-portraits in seconds. In IEEE Conference on Computer Vision and Pattern Recognition, 2020b.
- Smpl: A skinned multi-person linear model. ACM Transactions on Graphics, 34(6):248, 2015.
- Marching cubes: A high resolution 3d surface construction algorithm. ACM Siggraph Computer Graphics, 21(4):163–169, 1987.
- 3d real-time human reconstruction with a single rgbd camera. Applied Intelligence, pages 1–11, 2022.
- The power of points for modeling humans in clothing. In IEEE International Conference on Computer Vision, 2021.
- Neural point-based shape modeling of humans in challenging clothing. In International Conference on 3D Vision, 2022.
- Easy and fast reconstruction of a 3d avatar with an rgb-d sensor. Sensors, 17(5), 2017.
- One-shot person re-identification with a consumer depth camera. Person Re-Identification, pages 161–181, 2014.
- Siclope: Silhouette-based clothed people. In IEEE Conference on Computer Vision and Pattern Recognition, 2019.
- Expressive body capture: 3d hands, face, and body from a single image. In IEEE Conference on Computer Vision and Pattern Recognition, 2019.
- Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In IEEE Conference on Computer Vision and Pattern Recognition, 2021.
- Super-resolution 3d human shape from a single low-resolution image. In European Conference on Computer Vision, 2022.
- PIFu: Pixel-aligned implicit function for high-resolution clothed human digitization. In IEEE International Conference on Computer Vision, 2019.
- PIFuHD: Multi-level pixel-aligned implicit function for high-resolution 3d human digitization. In IEEE Conference on Computer Vision and Pattern Recognition, 2020.
- Facsimile: Fast and accurate scans from an image in less than a second. In IEEE International Conference on Computer Vision, 2019.
- Difu: Depth-guided implicit function for clothed human reconstruction. In IEEE Conference on Computer Vision and Pattern Recognition, 2023.
- Robustfusion: Robust volumetric performance reconstruction under human-object interactions from monocular rgbd stream. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- A supervised approach to predicting noise in depth images. In International Conference on Robotics and Automation, 2019.
- Neural-gif: Neural generalized implicit functions for animating people in clothing. In IEEE International Conference on Computer Vision, 2021.
- BodyNet: Volumetric inference of 3D human body shapes. In European Conference on Computer Vision, 2018.
- Normalgan: Learning detailed 3d human from a single rgb-d image. In European Conference on Computer Vision, 2020.
- Evaluation of video activity localizations integrating quality and quantity measurements. Computer Vision and Image Understanding, 127:14–30, 2014.
- View invariant human action recognition using histograms of 3d joints. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2012.
- Icon: Implicit clothed humans obtained from normals. In IEEE Conference on Computer Vision and Pattern Recognition, 2022.
- Econ: Explicit clothed humans optimized via normal integration. In IEEE Conference on Computer Vision and Pattern Recognition, 2023.
- 3d human pose, shape and texture from low-resolution images and videos. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):4490–4504, 2021.
- Function4d: Real-time human volumetric capture from very sparse consumer rgbd sensors. In IEEE Conference on Computer Vision and Pattern Recognition, 2021.
- Deep depth completion of a single rgb-d image. In IEEE Conference on Computer Vision and Pattern Recognition, 2018.
- Occupancy planes for single-view rgb-d human reconstruction. arXiv preprint arXiv:2208.02817, 2022.
- Deephuman: 3d human reconstruction from a single image. In IEEE International Conference on Computer Vision, 2019.
- PaMIR: Parametric model-conditioned implicit representation for image-based human reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(6):3170–3184, 2021.
- Marco Pesavento (6 papers)
- Yuanlu Xu (19 papers)
- Nikolaos Sarafianos (27 papers)
- Robert Maier (10 papers)
- Ziyan Wang (42 papers)
- Chun-Han Yao (13 papers)
- Marco Volino (14 papers)
- Edmond Boyer (25 papers)
- Adrian Hilton (39 papers)
- Tony Tung (21 papers)