Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Weakly-Supervised 3D Reconstruction of Clothed Humans via Normal Maps (2311.16042v1)

Published 27 Nov 2023 in cs.CV

Abstract: We present a novel deep learning-based approach to the 3D reconstruction of clothed humans using weak supervision via 2D normal maps. Given a single RGB image or multiview images, our network infers a signed distance function (SDF) discretized on a tetrahedral mesh surrounding the body in a rest pose. Subsequently, inferred pose and camera parameters are used to generate a normal map from the SDF. A key aspect of our approach is the use of Marching Tetrahedra to (uniquely) compute a triangulated surface from the SDF on the tetrahedral mesh, facilitating straightforward differentiation (and thus backpropagation). Thus, given only ground truth normal maps (with no volumetric information ground truth information), we can train the network to produce SDF values from corresponding RGB images. Optionally, an additional multiview loss leads to improved results. We demonstrate the efficacy of our approach for both network inference and 3D reconstruction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (104)
  1. Video based reconstruction of 3d people models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8387–8397, 2018.
  2. 2d human pose estimation: New benchmark and state of the art analysis. In Proceedings of the IEEE Conference on computer Vision and Pattern Recognition, pages 3686–3693, 2014.
  3. Bi3d: Stereo depth estimation via binary classifications. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1600–1608, 2020.
  4. Differentiable rendering of neural sdfs through reparameterization. In SIGGRAPH Asia 2022 Conference Papers, pages 1–9, 2022.
  5. Method for registration of 3-d shapes. In Sensor fusion IV: control paradigms and data structures, pages 586–606. Spie, 1992.
  6. Neural surface reconstruction of dynamic scenes with monocular rgb-d camera. arXiv preprint arXiv:2206.15258, 2022.
  7. Sesdf: Self-evolved signed distance field for implicit 3d clothed human reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4647–4657, 2023.
  8. End-to-end object detection with transformers. In European conference on computer vision, pages 213–229. Springer, 2020.
  9. Integratedpifu: Integrated pixel aligned implicit function for single-view human reconstruction. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part II, pages 328–344. Springer, 2022.
  10. Active contours without edges. IEEE Transactions on image processing, 10(2):266–277, 2001.
  11. gdna: Towards generative detailed neural avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20427–20437, 2022.
  12. Structured 3d features for reconstructing relightable and animatable avatars. arXiv preprint arXiv:2212.06820, 2022.
  13. Drapenet: Garment generation and self-supervised draping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1451–1460, 2023.
  14. Totalselfscan: Learning full-body avatars from self-portrait videos of faces, hands, and bodies. Advances in Neural Information Processing Systems, 35:13654–13667, 2022a.
  15. Geometry-aware two-scale pifu representation for human reconstruction. Advances in Neural Information Processing Systems, 35:31130–31144, 2022b.
  16. 3d morphable face models: past, present, and future. ACM Transactions on Graphics (ToG), 39(5):1–38, 2020.
  17. Fof: learning fourier occupancy field for monocular real-time human reconstruction. Advances in Neural Information Processing Systems, 35:7397–7409, 2022a.
  18. Capturing and animation of body and clothing from monocular video. arXiv preprint arXiv:2210.01868, 2022b.
  19. Moulding humans: Non-parametric 3d human shape estimation from single images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2232–2241, 2019.
  20. Digging into self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3828–3838, 2019.
  21. Humans in 4D: Reconstructing and tracking humans with transformers. In International Conference on Computer Vision (ICCV), 2023.
  22. Implicit geometric regularization for learning shapes. arXiv preprint arXiv:2002.10099, 2020.
  23. Vid2avatar: 3d avatar reconstruction from videos in the wild via self-supervised scene decomposition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12858–12868, 2023.
  24. Livecap: Real-time human performance capture from monocular video. ACM Transactions on Graphics (TOG), 38(2):14, 2019.
  25. Deepcap: Monocular human performance capture using weak supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5052–5063, 2020.
  26. High-fidelity 3d human digitization from single 2k resolution images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12869–12879, 2023.
  27. Stereopifu: Depth aware clothed human digitization via stereo vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 535–545, 2021.
  28. Humanrf: High-fidelity neural radiance fields for humans in motion. arXiv preprint arXiv:2305.06356, 2023.
  29. Selfrecon: Self reconstruction your digital avatar from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5605–5615, 2022a.
  30. Instantavatar: Learning avatars from monocular video in 60 seconds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16922–16932, 2023.
  31. Neuman: Neural human radiance field from a single video. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXII, pages 402–418. Springer, 2022b.
  32. Sharp: Shape-aware reconstruction of people in loose clothing. International Journal of Computer Vision, 131(4):918–937, 2023.
  33. Monocular human depth estimation via pose estimation. IEEE Access, 9:151444–151457, 2021.
  34. End-to-end recovery of human shape and pose. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7122–7131, 2018.
  35. Laplacianfusion: Detailed 3d clothed-human body reconstruction. ACM Transactions on Graphics (TOG), 41(6):1–14, 2022.
  36. Sampling is matter: Point-guided 3d human mesh reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12880–12889, 2023.
  37. Vibe: Video inference for human body pose and shape estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5253–5263, 2020.
  38. Probabilistic modeling for human mesh recovery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11605–11614, 2021.
  39. Normal assisted stereo depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2189–2199, 2020.
  40. A skinned tetrahedral mesh for hair animation and hair-water interaction. IEEE transactions on visualization and computer graphics, 25(3):1449–1459, 2018.
  41. A robust volume conserving method for character-water interaction. In Proceedings of the 18th annual ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pages 1–12, 2019.
  42. Avatarcap: Animatable avatar conditioned monocular human volumetric capture. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part I, pages 322–341. Springer, 2022.
  43. Neuralangelo: High-fidelity neural surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8456–8465, 2023.
  44. Deep marching cubes: Learning explicit surface representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2916–2925, 2018.
  45. Learning implicit templates for point-based clothed human modeling. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III, pages 210–228. Springer, 2022.
  46. Recent advances of monocular 2d and 3d human pose estimation: a deep learning perspective. ACM Computing Surveys, 55(4):1–41, 2022.
  47. Smpl: A skinned multi-person linear model. ACM transactions on graphics (TOG), 34(6):1–16, 2015.
  48. Marching cubes: A high resolution 3d surface construction algorithm. ACM siggraph computer graphics, 21(4):163–169, 1987.
  49. The power of points for modeling humans in clothing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10974–10984, 2021.
  50. A level set theory for neural implicit evolution under explicit flows. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part II, pages 711–729. Springer, 2022.
  51. Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision, pages 405–421. Springer, 2020.
  52. A crystalline, red green strategy for meshing highly deformable objects with tetrahedra. In IMR, pages 103–114. Citeseer, 2003.
  53. 3d clothed human reconstruction in the wild. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part II, pages 184–200. Springer, 2022.
  54. Cyclic test-time adaptation on monocular video for 3d human mesh reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14829–14839, 2023.
  55. Siclope: Silhouette-based clothed people. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4480–4490, 2019.
  56. Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3504–3515, 2020.
  57. Neural body fitting: Unifying deep learning and model based human pose and shape estimation. In 2018 international conference on 3D vision (3DV), pages 484–494. IEEE, 2018.
  58. Tetratsdf: 3d human reconstruction from a single image with a tetrahedral outer shell. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6011–6020, 2020.
  59. Level set methods and dynamic implicit surfaces. Appl. Mech. Rev., 57(3):B15–B15, 2004.
  60. Few-shot neural human performance rendering from sparse rgbd videos. arXiv preprint arXiv:2107.06505, 2021.
  61. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 165–174, 2019.
  62. Learning to estimate 3d human pose and shape from a single color image. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 459–468, 2018.
  63. Expressive body capture: 3d hands, face, and body from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10975–10985, 2019.
  64. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9054–9063, 2021.
  65. Clothcap: Seamless 4d clothing capture and retargeting. ACM Transactions on Graphics (ToG), 36(4):1–15, 2017.
  66. Meshsdf: Differentiable iso-surface extraction. Advances in Neural Information Processing Systems, 33:22468–22478, 2020.
  67. RenderPeople. Renderpeople, 2018.
  68. Permutosdf: Fast multi-view reconstruction with implicit surfaces using permutohedral lattices. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8466–8475, 2023.
  69. Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2304–2314, 2019.
  70. Pifuhd: Multi-level pixel-aligned implicit function for high-resolution 3d human digitization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 84–93, 2020.
  71. Diffustereo: High quality human reconstruction via diffusion-based stereo using sparse cameras. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXII, pages 702–720. Springer, 2022.
  72. X-avatar: Expressive human avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16911–16921, 2023.
  73. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. Advances in Neural Information Processing Systems, 34, 2021.
  74. Facsimile: Fast and accurate scans from an image in less than a second. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5330–5339, 2019.
  75. A level set approach for computing solutions to incompressible two-phase flow. Journal of Computational physics, 114(1):146–159, 1994.
  76. Neural capture of animatable 3d human from monocular video. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VI, pages 275–291. Springer, 2022.
  77. Adaptive physics based tetrahedral mesh generation using level sets. Engineering with computers, 21(1):2–18, 2005.
  78. Recovering 3d human mesh from monocular images: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  79. Neural-gif: Neural generalized implicit functions for animating people in clothing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11708–11718, 2021.
  80. Regularised marching tetrahedra: improved iso-surface extraction. Computers & Graphics, 23(4):583–598, 1999.
  81. Bodynet: Volumetric inference of 3d human body shapes. In Proceedings of the European Conference on Computer Vision (ECCV), pages 20–36, 2018.
  82. Differentiable signed distance function rendering. ACM Transactions on Graphics (TOG), 41(4):1–18, 2022.
  83. Metaavatar: Learning animatable clothed human models from few depth images. Advances in Neural Information Processing Systems, 34:2810–2822, 2021.
  84. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  85. Humannerf: Free-viewpoint rendering of moving people from monocular video. In Proceedings of the IEEE/CVF conference on computer vision and pattern Recognition, pages 16210–16220, 2022.
  86. Skinning a parameterization of three-dimensional space for neural network cloth. arXiv preprint arXiv:2006.04874, 2020.
  87. Modeling clothing as a separate layer for an animatable human avatar. ACM Transactions on Graphics (TOG), 40(6):1–15, 2021.
  88. Icon: Implicit clothed humans obtained from normals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13296–13306, 2022.
  89. Econ: Explicit clothed humans optimized via normal integration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 512–523, 2023.
  90. Nsf: Neural surface fields for human modeling from monocular depth. In ICCV, 2023.
  91. 3d human pose estimation in the wild by adversarial learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5255–5264, 2018.
  92. Multiview neural surface reconstruction by disentangling geometry and appearance. Advances in Neural Information Processing Systems, 33:2492–2502, 2020.
  93. Simulcap: Single-view human performance capture with cloth simulation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019.
  94. Monohuman: Animatable human neural field from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16943–16953, 2023.
  95. Weakly supervised 3d human pose and shape reconstruction with normalizing flows. In European Conference on Computer Vision, pages 465–481. Springer, 2020.
  96. Global-correlated 3d-decoupling transformer for clothed avatar reconstruction. arXiv preprint arXiv:2309.13524, 2023.
  97. Monocular depth estimation based on deep learning: An overview. Science China Technological Sciences, 63(9):1612–1627, 2020.
  98. Humannerf: Efficiently generated human radiance field from sparse inputs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7743–7753, 2022.
  99. A variational level set approach to multiphase motion. Journal of computational physics, 127(1):179–195, 1996.
  100. Learning visibility field for detailed 3d human reconstruction and relighting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 216–226, 2023a.
  101. Deephuman: 3d human reconstruction from a single image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7739–7749, 2019.
  102. Pamir: Parametric model-conditioned implicit representation for image-based human reconstruction. IEEE transactions on pattern analysis and machine intelligence, 2021.
  103. Avatarrex: Real-time expressive full-body avatars. arXiv preprint arXiv:2305.04789, 2023b.
  104. Hdhuman: High-quality human performance capture with sparse views. CoRR, 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.