Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dynamic Appearance Modeling of Clothed 3D Human Avatars using a Single Camera (2312.16842v1)

Published 28 Dec 2023 in cs.CV

Abstract: The appearance of a human in clothing is driven not only by the pose but also by its temporal context, i.e., motion. However, such context has been largely neglected by existing monocular human modeling methods whose neural networks often struggle to learn a video of a person with large dynamics due to the motion ambiguity, i.e., there exist numerous geometric configurations of clothes that are dependent on the context of motion even for the same pose. In this paper, we introduce a method for high-quality modeling of clothed 3D human avatars using a video of a person with dynamic movements. The main challenge comes from the lack of 3D ground truth data of geometry and its temporal correspondences. We address this challenge by introducing a novel compositional human modeling framework that takes advantage of both explicit and implicit human modeling. For explicit modeling, a neural network learns to generate point-wise shape residuals and appearance features of a 3D body model by comparing its 2D rendering results and the original images. This explicit model allows for the reconstruction of discriminative 3D motion features from UV space by encoding their temporal correspondences. For implicit modeling, an implicit network combines the appearance and 3D motion features to decode high-fidelity clothed 3D human avatars with motion-dependent geometry and texture. The experiments show that our method can generate a large variation of secondary motion in a physically plausible way.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Learning to reconstruct people in clothing from a single rgb camera. In CVPR, 2019.
  2. Detailed human avatars from monocular video. In 3DV, 2018.
  3. Video based reconstruction of 3d people models. In CVPR, 2018.
  4. Tex2shape: Detailed full human body geometry from a single image. In ICCV, 2019.
  5. Multi-garment net: Learning to dress 3d people from images. In proceedings of the IEEE/CVF international conference on computer vision, pages 5420–5430, 2019.
  6. Animatable neural radiance fields from monocular rgb videos. In arXiv, 2021.
  7. Neural unsigned distance fields for implicit function learning. NIPS, 2020.
  8. Monocular expressive body regression through body-driven attention. In ECCV, 2020.
  9. Patch-based image inpainting with generative adversarial networks. arXiv, 2018.
  10. Instance-level human parsing via part grouping network. In ECCV, 2018.
  11. Neural head avatars from monocular rgb videos. In CVPR, 2022.
  12. Implicit geometric regularization for learning shapes. arXiv preprint arXiv:2002.10099, 2020.
  13. Densepose: Dense human pose estimation in the wild. In CVPR, 2018.
  14. Stereopifu: Depth aware clothed human digitization via stereo vision. In CVPR, 2021.
  15. Learning high fidelity depths of dressed humans by watching social media dance videos. In CVPR, 2021.
  16. Selfrecon: Self reconstruction your digital avatar from monocular video. In CVPR, 2022.
  17. End-to-end recovery of human shape and pose. In CVPR, 2018.
  18. High-fidelity neural human motion transfer from monocular video. In CVPR, 2021.
  19. VIBE: Video inference for human body pose and shape estimation. In CVPR, 2020.
  20. Pare: Part attention regressor for 3d human body estimation. In ICCV, 2021.
  21. 360-degree textures of people in clothing from a single image. In 3DV, 2019.
  22. Smpl: A skinned multi-person linear model. TOG, 2015.
  23. Scale: Modeling clothed humans with a surface codec of articulated local elements. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16082–16093, 2021.
  24. Occupancy networks: Learning 3d reconstruction in function space. In CVPR, 2019.
  25. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  26. Deepsdf: Learning continuous signed distance functions for shape representation. In CVPR, 2019.
  27. Expressive body capture: 3d hands, face, and body from a single image. In CVPR, 2019.
  28. Animatable neural radiance fields for modeling dynamic human bodies. In ICCV, 2021.
  29. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In CVPR, 2021.
  30. Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2304–2314, 2019.
  31. Pifuhd: Multi-level pixel-aligned implicit function for high-resolution 3d human digitization. In CVPR, 2020.
  32. Scanimate: Weakly supervised learning of skinned clothed avatar networks. In CVPR, 2021.
  33. Very deep convolutional networks for large-scale image recognition. arXiv, 2014.
  34. Aist dance video database: Multi-genre, multi-dancer, and multi-camera database for dance information processing. In ISMIR, 2019.
  35. Metaavatar: Learning animatable clothed human models from few depth images. In NeurIPS, 2021.
  36. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 2004.
  37. Humannerf: Free-viewpoint rendering of moving people from monocular video. In CVPR, 2022.
  38. Synsin: End-to-end view synthesis from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7467–7477, 2020.
  39. Monoclothcap: Towards temporally coherent clothing capture from monocular rgb video. In 3DV, 2020.
  40. Econ: Explicit clothed humans obtained from normals. arXiv, 2022.
  41. Icon: Implicit clothed humans obtained from normals. In CVPR, 2022.
  42. H-nerf: Neural radiance fields for rendering and temporal reconstruction of humans in motion. In NIPS, 2021.
  43. Multiview neural surface reconstruction by disentangling geometry and appearance. In NIPS, 2020.
  44. Learning motion-dependent appearance for high-fidelity rendering of dynamic humans from a single camera. In CVPR, 2022.
  45. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
  46. High-fidelity human avatars from a single rgb camera. In CVPR, 2022.
  47. Structured local radiance fields for human avatar modeling. In CVPR, 2022.
  48. Detailed human shape estimation from a single image by hierarchical mesh deformation. In CVPR, 2019.
Citations (1)

Summary

We haven't generated a summary for this paper yet.