HiLo: Detailed and Robust 3D Clothed Human Reconstruction with High-and Low-Frequency Information of Parametric Models (2404.04876v2)
Abstract: Reconstructing 3D clothed human involves creating a detailed geometry of individuals in clothing, with applications ranging from virtual try-on, movies, to games. To enable practical and widespread applications, recent advances propose to generate a clothed human from an RGB image. However, they struggle to reconstruct detailed and robust avatars simultaneously. We empirically find that the high-frequency (HF) and low-frequency (LF) information from a parametric model has the potential to enhance geometry details and improve robustness to noise, respectively. Based on this, we propose HiLo, namely clothed human reconstruction with high- and low-frequency information, which contains two components. 1) To recover detailed geometry using HF information, we propose a progressive HF Signed Distance Function to enhance the detailed 3D geometry of a clothed human. We analyze that our progressive learning manner alleviates large gradients that hinder model convergence. 2) To achieve robust reconstruction against inaccurate estimation of the parametric model by using LF information, we propose a spatial interaction implicit function. This function effectively exploits the complementary spatial information from a low-resolution voxel grid of the parametric model. Experimental results demonstrate that HiLo outperforms the state-of-the-art methods by 10.43% and 9.54% in terms of Chamfer distance on the Thuman2.0 and CAPE datasets, respectively. Additionally, HiLo demonstrates robustness to noise from the parametric model, challenging poses, and various clothing styles.
- Learning to reconstruct people in clothing from a single rgb camera. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1175–1186, 2019a.
- Tex2shape: Detailed full human body geometry from a single image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2293–2303, 2019b.
- On the convergence rate of training recurrent neural networks. Advances in neural information processing systems, 32, 2019.
- Scape: shape completion and animation of people. In ACM SIGGRAPH 2005 Papers, pages 408–416. 2005.
- The influence of avatar representation on interpersonal communication in virtual social environments. IEEE transactions on visualization and computer graphics, 27(5):2608–2617, 2021.
- Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2874–2883, 2016.
- Multi-garment net: Learning to dress 3d people from images. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5420–5430, 2019.
- Combining implicit function learning and parametric models for 3d human reconstruction. In The European Conference on Computer Vision, pages 311–329, 2020a.
- Loopreg: Self-supervised learning of implicit surface correspondences, pose and shape for 3d human mesh registration. Advances in Neural Information Processing Systems, 33:12909–12922, 2020b.
- Towards building more robust models with frequency bias. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4402–4411, 2023.
- Rethinking and improving robustness of convolutional neural networks: a shapley value-based approach in frequency domain. In Advances in Neural Information Processing Systems, pages 324–337, 2022.
- Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5939–5948, 2019.
- Smplicit: Topology-aware generative model for clothed people. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11875–11885, 2021.
- Pina: Learning a personalized implicit neural avatar from a single rgb-d video sequence. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20470–20480, 2022.
- Pixie: A system for recommending 3+ billion items to 200+ million users in real-time. In Proceedings of the 2018 world wide web conference, pages 1775–1784, 2018.
- Continuity editing for 3d animation. In Proceedings of the AAAI Conference on Artificial Intelligence, 2015.
- The relightables: Volumetric performance capture of humans with realistic relighting. ACM Transactions on Graphics, 38(6):1–19, 2019.
- Geo-pifu: Geometry and pixel aligned implicit functions for single-view human reconstruction. Advances in Neural Information Processing Systems, 33:9276–9287, 2020.
- Arch++: Animation-ready clothed human reconstruction revisited. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11046–11056, 2021.
- Arch: Animatable reconstruction of clothed humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3093–3102, 2020.
- Spatial transformer networks. Advances in neural information processing systems, 28, 2015.
- Bcnet: Learning body and cloth shape from a single image. In The European Conference on Computer Vision, pages 18–35. Springer, 2020.
- Effects of avatar and background representation forms to co-presence in mixed reality (mr) tele-conference systems. In SIGGRAPH ASIA 2016 virtual reality meets physical reality: modelling and simulating virtual humans and environments, pages 1–4. 2016.
- Wavelet integrated cnns for noise-robust image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7245–7254, 2020.
- Dig: Draping implicit garment over the human body. In Proceedings of the Asian Conference on Computer Vision, pages 2780–2795, 2022.
- Smpl: A skinned multi-person linear model. Acm Transactions on Graphics, 34, 2015.
- Marching cubes: A high resolution 3d surface construction algorithm. In Seminal graphics: pioneering efforts that shaped the field, pages 347–353, 1998.
- Learning to dress 3d people in generative clothing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6469–6478, 2020.
- Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4460–4470, 2019.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In The European Conference on Computer Vision, 2020.
- 3d clothed human reconstruction in the wild. In European conference on computer vision, pages 184–200, 2022.
- Stacked hourglass networks for human pose estimation. In The European Conference on Computer Vision, pages 483–499, 2016.
- An hmd-based mixed reality system for avatar-mediated remote collaboration with bare-hand interaction. In Proceedings of the 25th International Conference on Artificial Reality and Telexistence and 20th Eurographics Symposium on Virtual Environments, pages 61–68, 2015.
- Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 165–174, 2019.
- Nerfies: Deformable neural radiance fields. Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5865–5874, 2021.
- On the difficulty of training recurrent neural networks. In International conference on machine learning, pages 1310–1318, 2013.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Expressive body capture: 3d hands, face, and body from a single image. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10975–10985, 2019.
- On the spectral bias of neural networks. In International Conference on Machine Learning, pages 5301–5310. PMLR, 2019a.
- On the spectral bias of neural networks. In International Conference on Machine Learning, pages 5301–5310, 2019b.
- Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In Proceedings of the IEEE/CVF international conference on computer vision, pages 2304–2314, 2019.
- Pifuhd: Multi-level pixel-aligned implicit function for high-resolution 3d human digitization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 84–93, 2020.
- Pixelwise view selection for unstructured multi-view stereo. In The European Conference on Computer Vision, 2016.
- Implicit neural representations with periodic activation functions. Advances in neural information processing systems, 33:7462–7473, 2020.
- Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5459–5469, 2022.
- Lin Sun. Research on the application of 3d animation special effects in animated films: Taking the film avatar as an example. Scientific Programming, 2022.
- Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, 4(2):26–31, 2012.
- Ea Christina Willumsen. Is my avatar my avatar? character autonomy and automated avatar actions in digital games. In DiGRA Conference, 2018.
- Monoclothcap: Towards temporally coherent clothing capture from monocular rgb video. In 2020 International Conference on 3D Vision (3DV), pages 322–332. IEEE, 2020.
- ICON: Implicit Clothed humans Obtained from Normals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13296–13306, 2022.
- ECON: Explicit Clothed humans Optimized via Normal integration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
- D-if: Uncertainty-aware human digitization via implicit distribution field. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9122–9132, 2023.
- The effect of avatar appearance on social presence in an augmented reality remote collaboration. In 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pages 547–556. IEEE, 2019.
- Pymaf: 3d human pose and shape regression with pyramidal mesh alignment feedback loop. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11446–11456, 2021.
- Pymaf-x: Towards well-aligned full-body model regression from monocular images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023a.
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023b.
- Interpreting adversarially trained convolutional neural networks. In International conference on machine learning, pages 7502–7511, 2019.
- Patchmatch based joint view selection and depthmap estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1510–1517, 2014.
- Deephuman: 3d human reconstruction from a single image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7739–7749, 2019.
- Pamir: Parametric model-conditioned implicit representation for image-based human reconstruction. IEEE transactions on pattern analysis and machine intelligence, 44(6):3170–3184, 2021.
- Detailed human shape estimation from a single image by hierarchical mesh deformation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4491–4500, 2019.