Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
GPT-4o
Gemini 2.5 Pro Pro
o3 Pro
GPT-4.1 Pro
DeepSeek R1 via Azure Pro
2000 character limit reached

Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping (2310.12474v4)

Published 19 Oct 2023 in cs.CV

Abstract: High-resolution 3D object generation remains a challenging task primarily due to the limited availability of comprehensive annotated training data. Recent advancements have aimed to overcome this constraint by harnessing image generative models, pretrained on extensive curated web datasets, using knowledge transfer techniques like Score Distillation Sampling (SDS). Efficiently addressing the requirements of high-resolution rendering often necessitates the adoption of latent representation-based models, such as the Latent Diffusion Model (LDM). In this framework, a significant challenge arises: To compute gradients for individual image pixels, it is necessary to backpropagate gradients from the designated latent space through the frozen components of the image model, such as the VAE encoder used within LDM. However, this gradient propagation pathway has never been optimized, remaining uncontrolled during training. We find that the unregulated gradients adversely affect the 3D model's capacity in acquiring texture-related information from the image generative model, leading to poor quality appearance synthesis. To address this overarching challenge, we propose an innovative operation termed Pixel-wise Gradient Clipping (PGC) designed for seamless integration into existing 3D generative models, thereby enhancing their synthesis quality. Specifically, we control the magnitude of stochastic gradients by clipping the pixel-wise gradients efficiently, while preserving crucial texture-related gradient directions. Despite this simplicity and minimal extra cost, extensive experiments demonstrate the efficacy of our PGC in enhancing the performance of existing 3D generative models for high-resolution object rendering.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. ediffi: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprint, 2022.
  2. High-performance large-scale image recognition without normalization. In ICML, 2021.
  3. Dreamavatar: Text-and-shape guided 3d human avatar generation via diffusion models. arXiv preprint, 2023.
  4. Text2tex: Text-driven texture synthesis via diffusion models. arXiv preprint, 2023a.
  5. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. In ICCV, 2023b.
  6. Delta denoising score. arXiv preprint, 2023.
  7. Denoising diffusion probabilistic models. In NeurIPS, 2020.
  8. Debiasing scores and prompts of 2d diffusion for robust text-to-3d generation. arXiv preprint, 2023.
  9. Avatarfusion: Zero-shot generation of clothing-decoupled 3d avatars using 2d diffusion. arXiv preprint, 2023a.
  10. Dreamtime: An improved optimization strategy for text-to-3d content creation. arXiv preprint, 2023b.
  11. Revisiting image pyramid structure for high resolution salient object detection. In ACCV, 2022.
  12. Revisiting gradient clipping: Stochastic bias and tight convergence guarantees. In ICML, 2023.
  13. Dreamhuman: Animatable 3d avatars from text. arXiv preprint, 2023.
  14. Focaldreamer: Text-driven 3d editing via focal-fusion assembly. arXiv preprint, 2023.
  15. Tada! text to animatable digital avatars. arXiv preprint, 2023.
  16. Magic3d: High-resolution text-to-3d content creation. arXiv preprint, 2022.
  17. Microsoft coco: Common objects in context. In ECCV, 2014.
  18. Zero-1-to-3: Zero-shot one image to 3d object. In ICCV, 2023a.
  19. Syncdreamer: Learning to generate multiview-consistent images from a single-view image. arXiv preprint, 2023b.
  20. Realfusion: 360 {{\{{\\\backslash\deg}}\}} reconstruction of any object from a single image. arXiv preprint, 2023.
  21. Latent-nerf for shape-guided generation of 3d shapes and textures. arXiv preprint, 2022.
  22. Tomáš Mikolov. Statistical language models based on neural networks. PhD thesis, Brno University of Technology, 2012.
  23. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 2021.
  24. Sdxl: improving latent diffusion models for high-resolution image synthesis. arXiv preprint, 2023.
  25. Dreamfusion: Text-to-3d using 2d diffusion. In ICLR, 2023.
  26. Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. arXiv preprint, 2023.
  27. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  28. Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, 2022.
  29. Let 2d diffusion model know 3d-consistency for robust text-to-3d generation. arXiv preprint, 2023.
  30. Mvdream: Multi-view diffusion for 3d generation. arXiv preprint, 2023.
  31. Jiaxiang Tang. Stable-dreamfusion: Text-to-3d with stable-diffusion, 2023. https://github.com/ashawkey/stable-dreamfusion.
  32. Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. arXiv preprint, 2023.
  33. Textmesh: Generation of realistic 3d meshes from text prompts. arXiv preprint, 2023.
  34. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In NeurIPS, 2021.
  35. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint, 2023.
  36. Hd-fusion: Detailed text-to-3d generation leveraging multiple noise estimation. arXiv preprint, 2023.
  37. Texture generation on 3d meshes with point-uv diffusion. arXiv preprint, 2023.
  38. Improved analysis of clipping algorithms for non-convex optimization. In NeurIPS, 2020.
  39. Why gradient clipping accelerates training: A theoretical justification for adaptivity. In ICLR, 2019.
  40. Adding conditional control to text-to-image diffusion models. arXiv preprint, 2023.
  41. Hifa: High-fidelity text-to-3d with advanced diffusion guidance. arXiv preprint, 2023.
Citations (10)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com