Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HAVE-FUN: Human Avatar Reconstruction from Few-Shot Unconstrained Images (2311.15672v2)

Published 27 Nov 2023 in cs.CV

Abstract: As for human avatar reconstruction, contemporary techniques commonly necessitate the acquisition of costly data and struggle to achieve satisfactory results from a small number of casual images. In this paper, we investigate this task from a few-shot unconstrained photo album. The reconstruction of human avatars from such data sources is challenging because of limited data amount and dynamic articulated poses. For handling dynamic data, we integrate a skinning mechanism with deep marching tetrahedra (DMTet) to form a drivable tetrahedral representation, which drives arbitrary mesh topologies generated by the DMTet for the adaptation of unconstrained images. To effectively mine instructive information from few-shot data, we devise a two-phase optimization method with few-shot reference and few-shot guidance. The former focuses on aligning avatar identity with reference images, while the latter aims to generate plausible appearances for unseen regions. Overall, our framework, called HaveFun, can undertake avatar reconstruction, rendering, and animation. Extensive experiments on our developed benchmarks demonstrate that HaveFun exhibits substantially superior performance in reconstructing the human body and hand. Project website: https://seanchenxy.github.io/HaveFunWeb/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (88)
  1. Image2StyleGAN: How to embed images into the stylegan latent space? In ICCV, 2019.
  2. DreamAvatar: Text-and-shape guided 3d human avatar generation via diffusion models. arXiv:2304.00916, 2023.
  3. Efficient geometry-aware 3D generative adversarial networks. In CVPR, 2022.
  4. SNARF: Differentiable forward skinning for animating non-rigid neural implicit shapes. In ICCV, 2021.
  5. Mobrecon: Mobile-friendly hand mesh reconstruction from monocular image. In CVPR, 2022.
  6. Mimic3d: Thriving 3d-aware gans via 3d-to-2d imitation. In ICCV, 2023a.
  7. Hand avatar: Free-pose hand animation and rendering from monocular video. In CVPR, 2023b.
  8. Pearson correlation coefficient. Noise reduction in speech processing, pages 1–4, 2009.
  9. LISA: Learning implicit shape and appearance of hands. In CVPR, 2022.
  10. Structured 3d features for reconstructing controllable avatars. In CVPR, 2023.
  11. Depth-supervised NeRF: Fewer views and faster training for free. In CVPR, 2022a.
  12. GRAM: Generative radiance manifolds for 3D-aware image generation. In CVPR, 2022b.
  13. Learning detailed radiance manifolds for high-fidelity and 3D-consistent portrait synthesis from monocular image. In CVPR, 2023.
  14. AG3D: Learning to Generate 3D Avatars from 2D Image Collections. In ICCV, 2023.
  15. Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans. In ICCV, 2021.
  16. DART: Articulated hand model with diverse accessories and rich textures. In NeurIPS, 2022a.
  17. Get3D: A generative model of high quality 3D textured shapes learned from images. In NeurIPS, 2022b.
  18. Neural head avatars from monocular RGB videos. In CVPR, 2022.
  19. Vid2Avatar: 3D avatar reconstruction from videos in the wild via self-supervised scene decomposition. In CVPR, 2023a.
  20. HandNeRF: Neural radiance fields for animatable interacting hands. In CVPR, 2023b.
  21. Learning locally editable virtual humans. In CVPR, 2023.
  22. Denoising diffusion probabilistic models. In NeurIPS, 2020.
  23. Avatarclip: Zero-shot text-driven generation and animation of 3D avatars. ACM TOG, 41(4):1–19, 2022.
  24. Dreamwaltz: Make a scene with complex 3D animatable avatars. arXiv:2305.12529, 2023a.
  25. One-shot implicit animatable avatars with model-based priors. In ICCV, 2023b.
  26. TeCH: Text-guided reconstruction of lifelike clothed humans. In 3DV, 2024.
  27. HumanRF: High-fidelity neural radiance fields for humans in motion. ACM TOG, 42(4):1–12, 2023.
  28. RelightableHands: Efficient neural relighting of articulated hand models. In CVPR, pages 16663–16673, 2023.
  29. Putting nerf on a diet: Semantically consistent few-shot view synthesis. In ICCV, 2021.
  30. SelfRecon: Self reconstruction your digital avatar from monocular video. In CVPR, 2022a.
  31. AvatarCraft: Transforming text into neural human avatars with parameterized shape and pose control. In ICCV, 2023a.
  32. HumanGen: Generating human radiance fields with explicit priors. In CVPR, 2023b.
  33. Neuman: Neural human radiance field from a single video. In ECCV, 2022b.
  34. Harp: Personalized hand reconstruction from a monocular rgb video. In CVPR, 2023.
  35. InfoNeRF: Ray entropy minimization for few-shot neural volume rendering. In CVPR, 2022.
  36. DreamHuman: Animatable 3d avatars from text. arxiv:2306.09329, 2023.
  37. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In ICML, 2022a.
  38. TAVA: Template-free animatable volumetric actors. In ECCV, 2022b.
  39. Posevocab: Learning joint-structured pose embeddings for human avatar modeling. In ACM SIGGRAPH Conference Proceedings, 2023.
  40. Tada! text to animatable digital avatars. arXiv:2308.10899, 2023.
  41. Magic3D: High-resolution text-to-3D content creation. In CVPR, 2023a.
  42. Consistent123: One image to highly consistent 3d asset using case-aware diffusion priors. arXiv:2309.17261, 2023b.
  43. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. arXiv:2306.16928, 2023a.
  44. Zero-1-to-3: Zero-shot one image to 3D object. In ICCV, 2023b.
  45. SyncDreamer: Generating multiview-consistent images from a single-view image. arXiv:2309.03453, 2023c.
  46. Sparseneus: Fast generalizable neural surface reconstruction from sparse views. In ECCV, 2022.
  47. SMPL: A skinned multi-person linear model. ACM TOG, 2015.
  48. RealFusion: 360 reconstruction of any object from a single image. In CVPR, 2023.
  49. NeRF: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  50. RegNeRF: Regularizing neural radiance fields for view synthesis from sparse inputs. In CVPR, 2022.
  51. Expressive body capture: 3D hands, face, and body from a single image. In CVPR, 2019.
  52. Animatable neural radiance fields for modeling dynamic human bodies. In ICCV, 2021a.
  53. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In CVPR, 2021b.
  54. DreamFusion: Text-to-3D using 2D diffusion. In ICLR, 2023.
  55. D-NeRF: Neural radiance fields for dynamic scenes. In CVPR, 2021.
  56. Learning transferable visual models from natural language supervision. In ICML, 2021.
  57. Pivotal tuning for latent-based editing of real images. ACM TOG, 42(1):1–13, 2022.
  58. Embodied hands: Modeling and capturing hands and bodies together. ACM TOG, 2017.
  59. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, 2023.
  60. PIFu: Pixel-aligned implicit function for high-resolution clothed human digitization. In ICCV, 2019.
  61. Pifuhd: Multi-level pixel-aligned implicit function for high-resolution 3d human digitization. In CVPR, 2020.
  62. DoubleField: Bridging the neural surface and radiance fields for high-fidelity human reconstruction and rendering. In CVPR, 2022.
  63. X-Avatar: Expressive human avatars. In CVPR, 2023.
  64. Deep marching tetrahedra: a hybrid representation for high-resolution 3D shape synthesis. In NeurIPS, 2021.
  65. Zero123++: A single image to consistent multi-view diffusion base model. arXiv:2310.15110, 2023.
  66. Next3D: Generative neural texture rasterization for 3D-aware head avatars. In CVPR, 2023.
  67. Make-it-3D: High-fidelity 3D creation from a single image with diffusion prior. In ICCV, 2023.
  68. Towards accurate alignment in real-time 3D hand-mesh reconstruction. In ICCV, 2021.
  69. Neural capture of animatable 3D human from monocular video. In ECCV, 2022.
  70. Progressive disentangled representation learning for fine-grained controllable talking head synthesis. In CVPR, 2023a.
  71. Score jacobian chaining: Lifting pretrained 2D diffusion models for 3D generation. In CVPR, 2023b.
  72. Arah: Animatable volume rendering of articulated human SDFs. In ECCV, 2022a.
  73. High-fidelity GAN inversion for image attribute editing. In CVPR, 2022b.
  74. Image quality assessment: from error visibility to structural similarity. IEEE TIP, 2004.
  75. HumanNeRF: Free-viewpoint rendering of moving people from monocular video. In CVPR, 2022.
  76. PersonNeRF: Personalized reconstruction from photo collections. In CVPR, 2023.
  77. Get3DHuman: Lifting StyleGAN-Human into a 3D generative model using pixel-aligned reconstruction priors. In ICCV, 2023.
  78. ICON: Implicit clothed humans obtained from normals. In CVPR, 2022.
  79. ECON: Explicit clothed humans optimized via normal integration. In CVPR, 2023.
  80. Surface-aligned neural radiance fields for controllable 3D human synthesis. In CVPR, 2022.
  81. PS-NeRF: Neural inverse rendering for multi-view photometric stereo. In ECCV, 2022.
  82. MonoHuman: Animatable human neural field from monocular video. CVPR, 2023.
  83. Avatarverse: High-quality & stable 3D avatar creation from text and pose. arXiv:2308.03610, 2023.
  84. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
  85. HumanNeRF: Efficiently generated human radiance field from sparse inputs. In CVPR, 2022.
  86. I M Avatar: Implicit morphable head avatars from videos. In CVPR, 2022a.
  87. Structured local radiance fields for human avatar modeling. In CVPR, 2022b.
  88. Avatarrex: Real-time expressive full-body avatars. ACM TOG, 42(4), 2023.
Citations (6)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub