Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
98 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AutoDecoding Latent 3D Diffusion Models (2307.05445v1)

Published 7 Jul 2023 in cs.CV

Abstract: We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core. The 3D autodecoder framework embeds properties learned from the target dataset in the latent space, which can then be decoded into a volumetric representation for rendering view-consistent appearance and geometry. We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations to learn a 3D diffusion from 2D images or monocular videos of rigid or articulated objects. Our approach is flexible enough to use either existing camera supervision or no camera information at all -- instead efficiently learning it during training. Our evaluations demonstrate that our generation results outperform state-of-the-art alternatives on various benchmark datasets and metrics, including multi-view image datasets of synthetic objects, real in-the-wild videos of moving people, and a large-scale, real video dataset of static objects.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (88)
  1. Learning Representations and Generative Models for 3D Point Clouds. In Proceedings of the International Conference on Machine Learning, 2018.
  2. Demystifying MMD GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
  3. Optimizing the latent space of generative networks. In arXiv, 2017.
  4. Large scale gan training for high fidelity natural image synthesis. In arXiv, 2018.
  5. pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021.
  6. Efficient Geometry-aware 3D Generative Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022.
  7. ShapeNet: An Information-Rich 3D Model Repository. In arXiv, 2015.
  8. WaveGrad: Estimating Gradients for Waveform Generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021.
  9. Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation. In arXiv, 2023.
  10. SDFusion: Multimodal 3d shape completion, reconstruction, and generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023.
  11. ABO: Dataset and Benchmarks for Real-World 3D Object Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022.
  12. Objaverse: A Universe of Annotated 3D Objects. In arXiv, 2022.
  13. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009.
  14. MIT 6.006, Lecture 5: Hashing I: Chaining, Hash Functions, 2009.
  15. Diffusion Models Beat Gans on Image Synthesis. In Proceedings of the Neural Information Processing Systems Conference, 2021.
  16. Score-based generative modeling with critically-damped langevin diffusion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022.
  17. PyTorch Lightning. GitHub. Note: https://github.com/PyTorchLightning/pytorch-lightning, 3, 2019.
  18. Riffusion - Stable diffusion for real-time music generation, 2022. URL https://riffusion.com/about.
  19. Generative adversarial nets. In Proceedings of the Neural Information Processing Systems Conference, 2014.
  20. Flexible Diffusion Modeling of Long Videos. In Proceedings of the Neural Information Processing Systems Conference, 2022.
  21. Latent video diffusion models for high-fidelity long video generation. In arXiv, 2023.
  22. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Proceedings of the Neural Information Processing Systems Conference, 2017.
  23. Classifier-free diffusion guidance. In arXiv, 2022.
  24. Denoising diffusion probabilistic models. In Proceedings of the Neural Information Processing Systems Conference, 2020.
  25. Video Diffusion Models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022.
  26. A. Horé and D. Ziou. Image quality metrics: Psnr vs. ssim. In Proceedings of the International Conference on Pattern Recognition, 2010.
  27. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In Proceedings of the European Conference on Computer Vision, 2016.
  28. Progressive growing of GANs for improved quality, stability, and variation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
  29. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019.
  30. Training generative adversarial networks with limited data. In arXiv, 2020a.
  31. Analyzing and Improving the Image Quality of StyleGAN. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020b.
  32. Elucidating the Design Space of Diffusion-Based Generative Models. In Proceedings of the Neural Information Processing Systems Conference, 2022.
  33. Adam: A Method for Stochastic Optimization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
  34. Auto-encoding variational bayes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014.
  35. Segment Anything. In arXiv, 2023.
  36. EPnP: An Accurate O(n) Solution to the PnP Problem. In International Journal of Computer Vision, 2009.
  37. Pose Space Deformation: A Unified Approach to Shape Interpolation and Skeleton-Driven Deformation. In ACM Transactions on Graphics, 2000.
  38. BARF: Bundle-Adjusting Neural Radiance Fields. In Proceedings of the IEEE International Conference on Computer Vision, 2021.
  39. Magic3D: High-Resolution Text-to-3D Content Creation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023.
  40. Robust High-Resolution Video Matting with Temporal Guidance. In Proceedings of the Winter Conference on Applications of Computer Vision, 2022.
  41. Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection. In arXiv, 2023.
  42. Marching Cubes: A High Resolution 3D Surface Construction Algorithm. In ACM Transactions on Graphics, 1987.
  43. VIDM: Video Implicit Diffusion Models. In Association for the Advancement of Artificial Intelligence Conference, 2023.
  44. NeRF: Representing scenes as Neural Radiance Fields for View Synthesis. In Proceedings of the European Conference on Computer Vision, 2020.
  45. DiffRF: Rendering-Guided 3D Radiance Field Diffusion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023.
  46. VoxCeleb: Large-scale speaker verification in the wild. Computer Science and Language, 2019.
  47. HoloGAN: Unsupervised Learning of 3D Representations From Natural Images. In Proceedings of the IEEE International Conference on Computer Vision, 2019.
  48. Blockgan: Learning 3d object-aware scene representations from unlabelled images. In arXiv, 2020.
  49. Improved denoising diffusion probabilistic models. In ICML, 2021.
  50. GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021.
  51. StyleGenes: Discrete and Efficient Latent Distributions for GANs. In arXiv, 2023.
  52. High-fidelity performance metrics for generative models in PyTorch, 2020. URL https://github.com/toshas/torch-fidelity. Version: 0.3.0, DOI: 10.5281/zenodo.4957738.
  53. PhotoShape: Photorealistic Materials for Large-Scale Shape Collections. In ACM Transactions on Graphics, 2018.
  54. Automatic Differentiation in PyTorch, 2017.
  55. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the Neural Information Processing Systems Conference, 2019.
  56. Dreamfusion: Text-to-3d using 2d diffusion. In arXiv, 2022.
  57. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. In The Journal of Machine Learning Research, 2020.
  58. Accelerating 3D Deep Learning with PyTorch3D. In arXiv, 2020.
  59. High-Resolution Image Synthesis With Latent Diffusion Models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022.
  60. Structure-from-Motion Revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
  61. GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis. In Proceedings of the Neural Information Processing Systems Conference, 2020.
  62. First Order Motion Model for Image Animation. In Proceedings of the Neural Information Processing Systems Conference, 2019.
  63. Unsupervised Volumetric Animation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023.
  64. Very deep convolutional networks for large-scale image recognition. In arXiv, 2014.
  65. Stylegan-v: A continuous video generator with the price, image quality and perks of stylegan2. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022a.
  66. EpiGRAF: Rethinking Training of 3D GANs. In Proceedings of the Neural Information Processing Systems Conference, 2022b.
  67. 3D Generation on ImageNet. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023.
  68. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, 2015.
  69. Denoising Diffusion Implicit Models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021a.
  70. Generative modeling by estimating gradients of the data distribution. In Proceedings of the Neural Information Processing Systems Conference, 2019.
  71. Score-based generative modeling through stochastic differential equations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021b.
  72. A good image generator is what you need for high-resolution video synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021.
  73. Score-based generative modeling in latent space. In Proceedings of the Neural Information Processing Systems Conference, 2021.
  74. Attention is all you need. In Proceedings of the Neural Information Processing Systems Conference, 2017.
  75. MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation. In Proceedings of the Neural Information Processing Systems Conference, 2022.
  76. NeRF−⁣−--- -: Neural Radiance Fields Without Known Camera Parameters. In arXiv, 2021.
  77. HumanNeRF: Free-Viewpoint Rendering of Moving People from Monocular Video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022.
  78. Dewey Lonzo Whaley III. The Interquartile Range: Theory and Estimation. PhD thesis, East Tennessee State University, 2005.
  79. Tackling the generative learning trilemma with denoising diffusion GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022.
  80. Pose for Everything: Towards Category-Agnostic Pose Estimation. In Proceedings of the European Conference on Computer Vision, 2022.
  81. GIRAFFE HD: A High-Resolution 3D-aware Generative Model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022.
  82. NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation. In arXiv, 2023.
  83. CelebV-Text: A Large-Scale Facial Text-Video Dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023a.
  84. Generating videos with dynamics-aware implicit generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022.
  85. MVImgNet: A Large-scale Dataset of Multi-view Images. In arXiv, 2023b.
  86. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
  87. MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. In arXiv, 2023a.
  88. Discrete contrastive diffusion for cross-modal music and image generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023b.
Citations (36)

Summary

We haven't generated a summary for this paper yet.