Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ATT3D: Amortized Text-to-3D Object Synthesis (2306.07349v1)

Published 6 Jun 2023 in cs.LG, cs.AI, and cs.CV

Abstract: Text-to-3D modelling has seen exciting progress by combining generative text-to-image models with image-to-3D methods like Neural Radiance Fields. DreamFusion recently achieved high-quality results but requires a lengthy, per-prompt optimization to create 3D objects. To address this, we amortize optimization over text prompts by training on many prompts simultaneously with a unified model, instead of separately. With this, we share computation across a prompt set, training in less time than per-prompt optimization. Our framework - Amortized text-to-3D (ATT3D) - enables knowledge-sharing between prompts to generalize to unseen setups and smooth interpolations between text for novel assets and simple animations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. Dreamfusion: Text-to-3d using 2d diffusion. arXiv:2209.14988, 2022.
  2. Magic3d: High-resolution text-to-3d content creation. arXiv:2211.10440, 2022.
  3. ediffi: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv:2211.01324, 2022.
  4. High-resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  5. Photorealistic text-to-image diffusion models with deep language understanding. arXiv:2205.11487, 2022.
  6. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 2021.
  7. Instant neural graphics primitives with a multiresolution hash encoding. arXiv:2201.05989, 2022.
  8. Variable bitrate neural fields. In ACM SIGGRAPH 2022 Conference Proceedings, 2022.
  9. Efficient geometry-aware 3d generative adversarial networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  10. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 2020.
  11. Score-based generative modeling through stochastic differential equations. arXiv:2011.13456, 2020.
  12. Classifier-free diffusion guidance. arXiv:2207.12598, 2022.
  13. Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR, 2020.
  14. Learning transferable visual models from natural language supervision. 2021.
  15. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. arXiv:2212.00774, 2022.
  16. Brandon Amos. Tutorial on amortized optimization for learning to optimize over continuous domains. arXiv:2202.00665, 2022.
  17. Attention beats concatenation for conditioning neural fields. arXiv:2209.10684, 2022a.
  18. Generative adversarial networks. Communications of the ACM, 2020.
  19. Spectral normalization for generative adversarial networks. arXiv:1802.05957, 2018.
  20. Large scale gan training for high fidelity natural image synthesis. arXiv:1809.11096, 2018.
  21. Connecting generative adversarial networks and actor-critic methods. arXiv:1610.01945, 2016.
  22. Negative momentum for improved game dynamics. In The 22nd International Conference on Artificial Intelligence and Statistics, 2019.
  23. Complex momentum for optimization in games. In International Conference on Artificial Intelligence and Statistics, pages 7742–7765. PMLR, 2022.
  24. Scalable second order optimization for deep learning. arXiv:2002.09018, 2020.
  25. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
  26. Auto-encoding variational bayes. arXiv:1312.6114, 2013.
  27. Zero-shot text-guided object generation with dream fields. In CVF Conference on Computer Vision and Pattern Recognition Proceedings, 2022.
  28. Imagen video: High definition video generation with diffusion models. arXiv:2210.02303, 2022.
  29. Hierarchical text-conditional image generation with clip latents. arXiv:2204.06125, 2022.
  30. If by deepfloyd lab at stabilityai, 2023. github.com/deep-floyd/IF.
  31. Lolnerf: Learn from one look. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022b.
  32. pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In IEEE/CVF conference on computer vision and pattern recognition, 2021.
  33. Realfusion: 360deg reconstruction of any object from a single image. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
  34. Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. arXiv:2303.14184, 2023.
  35. Learning generative models of textured 3d meshes from real-world images. In IEEE/CVF International Conference on Computer Vision, 2021.
  36. Get3d: A generative model of high quality 3d textured shapes learned from images. arXiv:2209.11163, 2022.
  37. Convolutional generation of textured 3d meshes. Advances in Neural Information Processing Systems, 2020.
  38. Learning to predict 3d objects with an interpolation-based differentiable renderer. Advances in Neural Information Processing Systems, 32, 2019.
  39. Clip-forge: Towards zero-shot text-to-shape generation. arXiv:2110.02624, 2021.
  40. Clip-mesh: Generating textured meshes from text using pretrained image-text models. ACM Transactions on Graphics (TOG), Proc. SIGGRAPH Asia, 2022.
  41. Latent-nerf for shape-guided generation of 3d shapes and textures. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
  42. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. Advances in Neural Information Processing Systems, 2021.
  43. Gaudi: A neural architect for immersive 3d scene generation. arXiv:2207.13751, 2022.
  44. Lion: Latent point diffusion models for 3d shape generation. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  45. 3d shape generation and completion through point-voxel diffusion. In IEEE/CVF International Conference on Computer Vision, 2021.
  46. Jiaxiang Tang. Stable-dreamfusion: Text-to-3d with stable-diffusion, 2022. github.com/ashawkey/stable-dreamfusion.
  47. threestudio: A unified framework for 3d content generation. github.com/threestudio-project/threestudio, 2023.
  48. Zero-1-to-3: Zero-shot one image to 3d object. arXiv:2303.11328, 2023.
  49. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. arXiv:2303.13873, 2023.
  50. Dream3d: Zero-shot text-to-3d synthesis using 3d shape prior and text-to-image diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
  51. Dreamavatar: Text-and-shape guided 3d human avatar generation via diffusion models. arXiv:2304.00916, 2023.
  52. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv:2305.16213, 2023.
  53. Learning to optimize: A primer and a benchmark. arXiv:2103.12828, 2021.
  54. Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 2021.
  55. Stochastic hyperparameter optimization through hypernetworks. arXiv:1802.09419, 2018.
  56. Self-tuning networks: Bilevel optimization of hyperparameters using structured best-response functions. In International Conference on Learning Representations, 2018.
  57. Stochastic backpropagation and approximate inference in deep generative models. In ICML, 2014.
  58. Inference suboptimality in variational autoencoders. In International Conference on Machine Learning, 2018.
  59. Meta-amortized variational inference and learning. In AAAI Conference on Artificial Intelligence, 2020.
  60. Hypernetworks. arXiv:1609.09106, 2016.
  61. Graph hypernetworks for neural architecture search. arXiv:1810.05749, 2018.
  62. Parameter prediction for unseen deep architectures. Advances in Neural Information Processing Systems, 2021.
  63. Metasdf: Meta-learning signed distance functions. Advances in Neural Information Processing Systems, 2020.
  64. From data to functa: Your data point is a function and you can treat it like one. In ICML, 2022.
  65. Text-to-4d dynamic scene generation. arXiv:2301.11280, 2023.
  66. Make-a-video: Text-to-video generation without text-video data. arXiv:2209.14792, 2022.
  67. Align your latents: High-resolution video synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
  68. Python reference manual. Centrum voor Wiskunde en Informatica Amsterdam, 1995.
  69. Travis E Oliphant. Python for scientific computing. Computing in Science & Engineering, 2007.
  70. Automatic differentiation in PyTorch. Openreview, 2017.
  71. John D Hunter. Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 2007.
  72. Gaussian error linear units (gelus). arXiv:1606.08415, 2016.
  73. Attention is all you need. Advances in neural information processing systems, 2017.
  74. Ref-nerf: structured view-dependent appearance for neural radiance fields. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Jonathan Lorraine (20 papers)
  2. Kevin Xie (13 papers)
  3. Xiaohui Zeng (28 papers)
  4. Chen-Hsuan Lin (17 papers)
  5. Towaki Takikawa (13 papers)
  6. Nicholas Sharp (20 papers)
  7. Tsung-Yi Lin (49 papers)
  8. Ming-Yu Liu (87 papers)
  9. Sanja Fidler (184 papers)
  10. James Lucas (24 papers)
Citations (70)

Summary

Overview of "LaTeX Author Guidelines for ICCV Proceedings"

The paper "LaTeX Author Guidelines for ICCV Proceedings" provides a comprehensive set of instructions for authors preparing manuscripts for submission to the International Conference on Computer Vision (ICCV). With a focus on proper formatting and adherence to the standards set forth by the IEEE Computer Society Press, the guidelines serve to ensure uniformity and quality across all conference submissions.

Manuscript Preparation

The document emphasizes several crucial aspects of manuscript preparation. Authors are advised to write their papers in English and adhere strictly to an eight-page limit, excluding references. Although subsequent pages for references are permitted without restriction, any violation of the length guidelines by extending the main content beyond eight pages will result in the paper not being reviewed. This page limitation is strictly enforced as the review process does not facilitate revisions for overlength submissions.

Dual Submission and Review Anonymity

Authors are cautioned against dual submissions, aligning with ICCV policies which mandate unique and original submissions to the conference. Furthermore, the paper clarifies the concept of blind review, explaining that anonymity does not necessitate removing references to the authors' previous work. Instead, it advises against self-referential language such as "my" or "our" when citing these works and emphasizes that technical details that uniquely identify authors must be handled with discretion.

Formatting Requirements

The paper delineates specific formatting guidelines, including a detailed description of margins, page numbering, type styles, and fonts. Main titles must be 14-point Times boldface, with author names and affiliations printed in 12-point non-boldface. Main text must appear in a two-column format, justified in 10-point Times, while figure captions and footnotes should adhere to 9-point Roman type.

Figures and Equations

For figures and equations, the guidelines stress the importance of numbering all displayed equations and sections for easy reference, eschewing vague descriptions to ensure clarity for readers. In terms of visual content, proper alignment, and sizing of graphical elements to match the text body are emphasized, considering many readers will print hard copies of the paper.

Reference and Citation Standards

The guidelines provide explicit instructions for bibliographical references, advocating a numerical citation system in square brackets and recommending that citations be listed in numerical rather than chronological order. The importance of adherence to these guidelines for consistent and professional presentation of references cannot be overstated.

Conclusion

In summary, this document serves as a structured framework for authors aiming to contribute to ICCV proceedings, encapsulating essential guidelines that foster clarity, consistency, and scholarly rigor in scientific communication. As the field of computer vision evolves, these standards facilitate the dissemination of research findings that are accessible and reliable, reflecting the conference's position at the forefront of innovation and academic exchange. The implementation of such thorough guidelines plays a critical role in upholding the quality and integrity of conference submissions, fostering a constructive environment for the presentation and evaluation of cutting-edge research.

Youtube Logo Streamline Icon: https://streamlinehq.com