Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PrimDiffusion: Volumetric Primitives Diffusion for 3D Human Generation (2312.04559v1)

Published 7 Dec 2023 in cs.CV and cs.GR

Abstract: We present PrimDiffusion, the first diffusion-based framework for 3D human generation. Devising diffusion models for 3D human generation is difficult due to the intensive computational cost of 3D representations and the articulated topology of 3D humans. To tackle these challenges, our key insight is operating the denoising diffusion process directly on a set of volumetric primitives, which models the human body as a number of small volumes with radiance and kinematic information. This volumetric primitives representation marries the capacity of volumetric representations with the efficiency of primitive-based rendering. Our PrimDiffusion framework has three appealing properties: 1) compact and expressive parameter space for the diffusion model, 2) flexible 3D representation that incorporates human prior, and 3) decoder-free rendering for efficient novel-view and novel-pose synthesis. Extensive experiments validate that PrimDiffusion outperforms state-of-the-art methods in 3D human generation. Notably, compared to GAN-based methods, our PrimDiffusion supports real-time rendering of high-quality 3D humans at a resolution of $512\times512$ once the denoising process is done. We also demonstrate the flexibility of our framework on training-free conditional generation such as texture transfer and 3D inpainting.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. RenderDiffusion: Image diffusion for 3D reconstruction, inpainting and generation. arXiv, 2022.
  2. 2d human pose estimation: New benchmark and state of the art analysis. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 3686–3693, 2014.
  3. Generative neural articulated radiance fields. Advances in Neural Information Processing Systems, 35:19900–19916, 2022.
  4. Demystifying MMD GANs. In International Conference on Learning Representations, 2018.
  5. Efficient geometry-aware 3D generative adversarial networks. In CVPR, 2022.
  6. ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012, 2015.
  7. gdna: Towards generative detailed neural avatars. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2022.
  8. Relighting4D: Neural relightable human from videos. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XIV, pages 606–623. Springer, 2022.
  9. Scenedreamer: Unbounded 3D scene generation from 2D image collections. In arXiv, 2023.
  10. MMHuman3D Contributors. Openmmlab 3D human parametric model toolbox and benchmark. https://github.com/open-mmlab/mmhuman3d, 2021.
  11. Gram: Generative radiance manifolds for 3D-aware image generation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  12. Insetgan for full-body image generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR), pages 7723–7732, June 2022.
  13. Stylegan-human: A data-centric odyssey of human generation. arXiv preprint, arXiv:2204.11823, 2022.
  14. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014.
  15. Stylenerf: A style-based 3D aware generator for high-resolution image synthesis. In International Conference on Learning Representations, 2022.
  16. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  17. Denoising diffusion probabilistic models. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 6840–6851. Curran Associates, Inc., 2020.
  18. EVA3D: Compositional 3D human generation from 2D image collections. In International Conference on Learning Representations, 2023.
  19. Avatarclip: zero-shot text-driven generation and animation of 3D avatars. ACM Transactions on Graphics (TOG), 41(4):1–19, 2022.
  20. Humanliff: Layer-wise 3d human generation with diffusion model. arXiv preprint, 2023.
  21. Humangen: Generating human radiance fields with explicit priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12543–12554, June 2023.
  22. Text2human: Text-driven controllable human image generation. ACM Transactions on Graphics (TOG), 41(4):1–11, 2022.
  23. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2901–2910, 2017.
  24. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
  25. Tryongan: Body-aware try-on via layered interpolation. ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH 2021), 40(4), 2021.
  26. Meshdiffusion: Score-based generative 3D mesh modeling. In International Conference on Learning Representations, 2023.
  27. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
  28. Neural volumes: Learning dynamic renderable volumes from images. ACM Trans. Graph., 38(4):65:1–65:14, July 2019.
  29. Mixture of volumetric primitives for efficient neural rendering. ACM Trans. Graph., 40(4), jul 2021.
  30. SMPL: A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, Oct. 2015.
  31. Diffusion probabilistic models for 3D point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021.
  32. AMASS: Archive of motion capture as surface shapes. In International Conference on Computer Vision, pages 5442–5451, Oct. 2019.
  33. NeRF: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  34. Diffrf: Rendering-guided 3D radiance field diffusion. In arxiv, 2022.
  35. GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models. In Proceedings of the 39th International Conference on Machine Learning, pages 16784–16804, 2022.
  36. Giraffe: Representing scenes as compositional generative neural feature fields. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.
  37. Unsupervised learning of efficient geometry-aware neural articulated representations. arXiv:2204.08839, 2022.
  38. StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13503–13513, June 2022.
  39. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9054–9063, 2021.
  40. Dreamfusion: Text-to-3D using 2D diffusion. arXiv preprint arXiv:2209.14988, 2022.
  41. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  42. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
  43. Vision transformers for dense prediction. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12179–12188, 2021.
  44. Drivable volumetric avatars using texel-aligned features. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–9, 2022.
  45. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  46. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  47. Photorealistic text-to-image diffusion models with deep language understanding. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
  48. Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. arXiv preprint arXiv:1905.05172, 2019.
  49. Humangan: A generative model of human images. In 2021 International Conference on 3D Vision (3DV), pages 258–267. IEEE, 2021.
  50. Graf: Generative radiance fields for 3D-aware image synthesis. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  51. Voxgraf: Fast 3D-aware image synthesis with sparse voxel grids. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  52. Deep unsupervised learning using nonequilibrium thermodynamics. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 2256–2265, Lille, France, 07–09 Jul 2015. PMLR.
  53. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, pages 11895–11907, 2019.
  54. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5459–5469, 2022.
  55. Rethinking the inception architecture for computer vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2826, 2016.
  56. https://renderpeople.com/3d-people/. Renderpeople, 2018.
  57. Metaavatar: Learning animatable clothed human models from few depth images. In Advances in Neural Information Processing Systems, 2021.
  58. Rodin: A generative model for sculpting 3D digital avatars using diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4563–4573, 2023.
  59. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004.
  60. GRAM-HD: 3D-Consistent Image Generation at High Resolution with Generative Radiance Manifolds. Technical Report arXiv:2206.07255, arXiv, June 2022.
  61. Lion: Latent point diffusion models for 3D shape generation. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  62. Avatargen: A 3D generative model for animatable human avatars. In Arxiv, 2022.
  63. Motiondiffuse: Text-driven human motion generation with diffusion model. arXiv preprint arXiv:2208.15001, 2022.
  64. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
  65. Pamir: Parametric model-conditioned implicit representation for image-based human reconstruction. IEEE transactions on pattern analysis and machine intelligence, 44(6):3170–3184, 2021.
  66. 3D shape generation and completion through point-voxel diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5826–5835, October 2021.
Citations (20)

Summary

We haven't generated a summary for this paper yet.