2000 character limit reached
ATT3D: Amortized Text-to-3D Object Synthesis (2306.07349v1)
Published 6 Jun 2023 in cs.LG, cs.AI, and cs.CV
Abstract: Text-to-3D modelling has seen exciting progress by combining generative text-to-image models with image-to-3D methods like Neural Radiance Fields. DreamFusion recently achieved high-quality results but requires a lengthy, per-prompt optimization to create 3D objects. To address this, we amortize optimization over text prompts by training on many prompts simultaneously with a unified model, instead of separately. With this, we share computation across a prompt set, training in less time than per-prompt optimization. Our framework - Amortized text-to-3D (ATT3D) - enables knowledge-sharing between prompts to generalize to unseen setups and smooth interpolations between text for novel assets and simple animations.
- Dreamfusion: Text-to-3d using 2d diffusion. arXiv:2209.14988, 2022.
- Magic3d: High-resolution text-to-3d content creation. arXiv:2211.10440, 2022.
- ediffi: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv:2211.01324, 2022.
- High-resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
- Photorealistic text-to-image diffusion models with deep language understanding. arXiv:2205.11487, 2022.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 2021.
- Instant neural graphics primitives with a multiresolution hash encoding. arXiv:2201.05989, 2022.
- Variable bitrate neural fields. In ACM SIGGRAPH 2022 Conference Proceedings, 2022.
- Efficient geometry-aware 3d generative adversarial networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 2020.
- Score-based generative modeling through stochastic differential equations. arXiv:2011.13456, 2020.
- Classifier-free diffusion guidance. arXiv:2207.12598, 2022.
- Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR, 2020.
- Learning transferable visual models from natural language supervision. 2021.
- Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. arXiv:2212.00774, 2022.
- Brandon Amos. Tutorial on amortized optimization for learning to optimize over continuous domains. arXiv:2202.00665, 2022.
- Attention beats concatenation for conditioning neural fields. arXiv:2209.10684, 2022a.
- Generative adversarial networks. Communications of the ACM, 2020.
- Spectral normalization for generative adversarial networks. arXiv:1802.05957, 2018.
- Large scale gan training for high fidelity natural image synthesis. arXiv:1809.11096, 2018.
- Connecting generative adversarial networks and actor-critic methods. arXiv:1610.01945, 2016.
- Negative momentum for improved game dynamics. In The 22nd International Conference on Artificial Intelligence and Statistics, 2019.
- Complex momentum for optimization in games. In International Conference on Artificial Intelligence and Statistics, pages 7742–7765. PMLR, 2022.
- Scalable second order optimization for deep learning. arXiv:2002.09018, 2020.
- Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
- Auto-encoding variational bayes. arXiv:1312.6114, 2013.
- Zero-shot text-guided object generation with dream fields. In CVF Conference on Computer Vision and Pattern Recognition Proceedings, 2022.
- Imagen video: High definition video generation with diffusion models. arXiv:2210.02303, 2022.
- Hierarchical text-conditional image generation with clip latents. arXiv:2204.06125, 2022.
- If by deepfloyd lab at stabilityai, 2023. github.com/deep-floyd/IF.
- Lolnerf: Learn from one look. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022b.
- pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In IEEE/CVF conference on computer vision and pattern recognition, 2021.
- Realfusion: 360deg reconstruction of any object from a single image. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
- Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. arXiv:2303.14184, 2023.
- Learning generative models of textured 3d meshes from real-world images. In IEEE/CVF International Conference on Computer Vision, 2021.
- Get3d: A generative model of high quality 3d textured shapes learned from images. arXiv:2209.11163, 2022.
- Convolutional generation of textured 3d meshes. Advances in Neural Information Processing Systems, 2020.
- Learning to predict 3d objects with an interpolation-based differentiable renderer. Advances in Neural Information Processing Systems, 32, 2019.
- Clip-forge: Towards zero-shot text-to-shape generation. arXiv:2110.02624, 2021.
- Clip-mesh: Generating textured meshes from text using pretrained image-text models. ACM Transactions on Graphics (TOG), Proc. SIGGRAPH Asia, 2022.
- Latent-nerf for shape-guided generation of 3d shapes and textures. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
- Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. Advances in Neural Information Processing Systems, 2021.
- Gaudi: A neural architect for immersive 3d scene generation. arXiv:2207.13751, 2022.
- Lion: Latent point diffusion models for 3d shape generation. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- 3d shape generation and completion through point-voxel diffusion. In IEEE/CVF International Conference on Computer Vision, 2021.
- Jiaxiang Tang. Stable-dreamfusion: Text-to-3d with stable-diffusion, 2022. github.com/ashawkey/stable-dreamfusion.
- threestudio: A unified framework for 3d content generation. github.com/threestudio-project/threestudio, 2023.
- Zero-1-to-3: Zero-shot one image to 3d object. arXiv:2303.11328, 2023.
- Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. arXiv:2303.13873, 2023.
- Dream3d: Zero-shot text-to-3d synthesis using 3d shape prior and text-to-image diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
- Dreamavatar: Text-and-shape guided 3d human avatar generation via diffusion models. arXiv:2304.00916, 2023.
- Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv:2305.16213, 2023.
- Learning to optimize: A primer and a benchmark. arXiv:2103.12828, 2021.
- Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 2021.
- Stochastic hyperparameter optimization through hypernetworks. arXiv:1802.09419, 2018.
- Self-tuning networks: Bilevel optimization of hyperparameters using structured best-response functions. In International Conference on Learning Representations, 2018.
- Stochastic backpropagation and approximate inference in deep generative models. In ICML, 2014.
- Inference suboptimality in variational autoencoders. In International Conference on Machine Learning, 2018.
- Meta-amortized variational inference and learning. In AAAI Conference on Artificial Intelligence, 2020.
- Hypernetworks. arXiv:1609.09106, 2016.
- Graph hypernetworks for neural architecture search. arXiv:1810.05749, 2018.
- Parameter prediction for unseen deep architectures. Advances in Neural Information Processing Systems, 2021.
- Metasdf: Meta-learning signed distance functions. Advances in Neural Information Processing Systems, 2020.
- From data to functa: Your data point is a function and you can treat it like one. In ICML, 2022.
- Text-to-4d dynamic scene generation. arXiv:2301.11280, 2023.
- Make-a-video: Text-to-video generation without text-video data. arXiv:2209.14792, 2022.
- Align your latents: High-resolution video synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
- Python reference manual. Centrum voor Wiskunde en Informatica Amsterdam, 1995.
- Travis E Oliphant. Python for scientific computing. Computing in Science & Engineering, 2007.
- Automatic differentiation in PyTorch. Openreview, 2017.
- John D Hunter. Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 2007.
- Gaussian error linear units (gelus). arXiv:1606.08415, 2016.
- Attention is all you need. Advances in neural information processing systems, 2017.
- Ref-nerf: structured view-dependent appearance for neural radiance fields. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Jonathan Lorraine (20 papers)
- Kevin Xie (13 papers)
- Xiaohui Zeng (28 papers)
- Chen-Hsuan Lin (17 papers)
- Towaki Takikawa (13 papers)
- Nicholas Sharp (20 papers)
- Tsung-Yi Lin (49 papers)
- Ming-Yu Liu (87 papers)
- Sanja Fidler (184 papers)
- James Lucas (24 papers)