Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PGAHum: Prior-Guided Geometry and Appearance Learning for High-Fidelity Animatable Human Reconstruction (2404.13862v1)

Published 22 Apr 2024 in cs.CV

Abstract: Recent techniques on implicit geometry representation learning and neural rendering have shown promising results for 3D clothed human reconstruction from sparse video inputs. However, it is still challenging to reconstruct detailed surface geometry and even more difficult to synthesize photorealistic novel views with animated human poses. In this work, we introduce PGAHum, a prior-guided geometry and appearance learning framework for high-fidelity animatable human reconstruction. We thoroughly exploit 3D human priors in three key modules of PGAHum to achieve high-quality geometry reconstruction with intricate details and photorealistic view synthesis on unseen poses. First, a prior-based implicit geometry representation of 3D human, which contains a delta SDF predicted by a tri-plane network and a base SDF derived from the prior SMPL model, is proposed to model the surface details and the body shape in a disentangled manner. Second, we introduce a novel prior-guided sampling strategy that fully leverages the prior information of the human pose and body to sample the query points within or near the body surface. By avoiding unnecessary learning in the empty 3D space, the neural rendering can recover more appearance details. Last, we propose a novel iterative backward deformation strategy to progressively find the correspondence for the query point in observation space. A skinning weights prediction model is learned based on the prior provided by the SMPL model to achieve the iterative backward LBS deformation. Extensive quantitative and qualitative comparisons on various datasets are conducted and the results demonstrate the superiority of our framework. Ablation studies also verify the effectiveness of each scheme for geometry and appearance learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Detailed human avatars from monocular video. In 2018 International Conference on 3D Vision (3DV). IEEE, 98–109.
  2. Video based reconstruction of 3d people models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8387–8397.
  3. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14. Springer, 561–578.
  4. Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 16123–16133.
  5. Animatable Neural Radiance Fields from Monocular RGB Videos. arXiv:2106.13629 [cs.CV]
  6. Learning Neural Volumetric Representations of Dynamic Humans in Minutes. In CVPR.
  7. Implicit Geometric Regularization for Learning Shapes. In Proceedings of Machine Learning and Systems 2020. 3569–3579.
  8. Livecap: Real-time human performance capture from monocular video. ACM Transactions On Graphics (TOG) 38, 2 (2019), 1–17.
  9. Deepcap: Monocular human performance capture using weak supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5052–5063.
  10. Arch++: Animation-ready clothed human reconstruction revisited. In Proceedings of the IEEE/CVF international conference on computer vision. 11046–11056.
  11. Stereopifu: Depth aware clothed human digitization via stereo vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 535–545.
  12. Arch: Animatable reconstruction of clothed humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3093–3102.
  13. FlexNeRF: Photorealistic free-viewpoint rendering of moving humans from sparse views. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21118–21127.
  14. SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  15. Disentangled human body embedding based on deep hierarchical neural network. IEEE transactions on visualization and computer graphics 26, 8 (2020), 2560–2575.
  16. Bcnet: Learning body and cloth shape from a single image. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16. Springer, 18–35.
  17. Instantavatar: Learning avatars from monocular video in 60 seconds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16922–16932.
  18. Neuman: Neural human radiance field from a single video. In European Conference on Computer Vision. Springer, 402–418.
  19. End-to-end recovery of human shape and pose. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7122–7131.
  20. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  21. PoseVocab: Learning Joint-structured Pose Embeddings for Human Avatar Modeling. arXiv preprint arXiv:2304.13006 (2023).
  22. High-Fidelity Clothed Avatar Reconstruction from a Single Image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8662–8672.
  23. SMPL: A Skinned Multi-Person Linear Model. ACM Trans. Graphics (Proc. SIGGRAPH Asia) 34, 6 (Oct. 2015), 248:1–248:16.
  24. SCALE: Modeling clothed humans with a surface codec of articulated local elements. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16082–16093.
  25. Learning to dress 3d people in generative clothing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6469–6478.
  26. Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4460–4470.
  27. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV.
  28. Neural body fitting: Unifying deep learning and model based human pose and shape estimation. In 2018 international conference on 3D vision (3DV). IEEE, 484–494.
  29. Star: Sparse trained articulated human body regressor. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16. Springer, 598–613.
  30. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  31. Expressive body capture: 3d hands, face, and body from a single image. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10975–10985.
  32. Learning to estimate 3D human pose and shape from a single color image. In Proceedings of the IEEE conference on computer vision and pattern recognition. 459–468.
  33. Animatable neural radiance fields for modeling dynamic human bodies. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14314–14323.
  34. Animatable Implicit Neural Representations for Creating Realistic Avatars from Videos. TPAMI (2024).
  35. Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans. In CVPR.
  36. Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In Proceedings of the IEEE/CVF international conference on computer vision. 2304–2314.
  37. Pifuhd: Multi-level pixel-aligned implicit function for high-resolution 3d human digitization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 84–93.
  38. A-nerf: Articulated neural radiance fields for learning human shape, appearance, and pose. Advances in Neural Information Processing Systems 34 (2021), 12278–12291.
  39. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689 (2021).
  40. ARAH: Animatable Volume Rendering of Articulated Human SDFs. In European Conference on Computer Vision.
  41. Pet-neus: Positional encoding tri-planes for neural surfaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12598–12607.
  42. Humannerf: Free-viewpoint rendering of moving people from monocular video. In Proceedings of the IEEE/CVF conference on computer vision and pattern Recognition. 16210–16220.
  43. H-nerf: Neural radiance fields for rendering and temporal reconstruction of humans in motion. Advances in Neural Information Processing Systems 34 (2021), 14955–14966.
  44. Monoperfcap: Human performance capture from monocular video. ACM Transactions on Graphics (ToG) 37, 2 (2018), 1–15.
  45. MonoHuman: Animatable Human Neural Field from Monocular Video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16943–16953.
  46. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586–595.
  47. Structured local radiance fields for human avatar modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15893–15903.
  48. Pamir: Parametric model-conditioned implicit representation for image-based human reconstruction. IEEE transactions on pattern analysis and machine intelligence 44, 6 (2021), 3170–3184.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Hao Wang (1120 papers)
  2. Qingshan Xu (27 papers)
  3. Hongyuan Chen (2 papers)
  4. Rui Ma (112 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.