ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars (2403.15383v2)
Abstract: Real-world applications often require a large gallery of 3D assets that share a consistent theme. While remarkable advances have been made in general 3D content creation from text or image, synthesizing customized 3D assets following the shared theme of input 3D exemplars remains an open and challenging problem. In this work, we present ThemeStation, a novel approach for theme-aware 3D-to-3D generation. ThemeStation synthesizes customized 3D assets based on given few exemplars with two goals: 1) unity for generating 3D assets that thematically align with the given exemplars and 2) diversity for generating 3D assets with a high degree of variations. To this end, we design a two-stage framework that draws a concept image first, followed by a reference-informed 3D modeling stage. We propose a novel dual score distillation (DSD) loss to jointly leverage priors from both the input exemplars and the synthesized concept image. Extensive experiments and user studies confirm that ThemeStation surpasses prior works in producing diverse theme-aware 3D models with impressive quality. ThemeStation also enables various applications such as controllable 3D-to-3D generation.
- Break-A-Scene: Extracting Multiple Concepts from a Single Image. arXiv preprint arXiv:2305.16311 (2023).
- Bob. 2022. 3D Modeling 101: Comprehensive Beginners Guide. Retrieved Jan 03, 2024 from https://wow-how.com/articles/3d-modeling-101-comprehensive-beginners-guide
- Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018).
- Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18392–18402.
- CGHero. 2022. The Stages of Creating a 3D Model. Retrieved Jan 02, 2024 from https://cghero.com/articles/stages-of-creating-3d-model
- Efficient geometry-aware 3D generative adversarial networks. In CVPR.
- Probabilistic reasoning for assembly-based 3D modeling. In ACM SIGGRAPH 2011 papers. 1–10.
- Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction. arXiv preprint arXiv:2304.06714 (2023).
- Fantasia3D: Disentangling geometry and appearance for high-quality text-to-3D content creation. arXiv preprint arXiv:2303.13873 (2023).
- ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance.
- Victor Dibia. 2022. Latent Diffusion Models: Components and Denoising Steps. Retrieved Jan 04, 2024 from https://victordibia.com/blog/stable-diffusion-denoising/
- HyperDiffusion: Generating Implicit Neural Fields with Weight-Space Diffusion.
- An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion.
- An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618 (2022).
- Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2414–2423.
- 3DGen: Triplane latent diffusion for textured mesh generation. arXiv preprint arXiv:2303.05371 (2023).
- Zexin He and Tengfei Wang. 2023. OpenLRM: Open-Source Large Reconstruction Models. https://github.com/3DTopia/OpenLRM.
- Prompt-to-prompt image editing with cross attention control. (2022).
- Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851.
- 3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors. arXiv preprint arXiv:2403.02234 (2024).
- Lrm: Large reconstruction model for single image to 3D. arXiv preprint arXiv:2311.04400 (2023).
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
- Heewoo Jun and Alex Nichol. 2023. Shap-e: Generating conditional 3D implicit functions. arXiv preprint arXiv:2305.02463 (2023).
- A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401–4410.
- Learning part-based templates from large collections of 3D shapes. ACM Transactions on Graphics (TOG) 32, 4 (2013), 1–12.
- Patch-based 3D Natural Scene Generation from a Single Example. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16762–16772.
- Magic3D: High-Resolution Text-to-3D Content Creation. In Conference on Computer Vision and Pattern Recognition (CVPR).
- One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion. arXiv preprint arXiv:2311.07885 (2023).
- Zero-1-to-3: Zero-shot one image to 3D object. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9298–9309.
- SyncDreamer: Generating Multiview-consistent Images from a Single-view Image. arXiv preprint arXiv:2309.03453 (2023).
- Wonder3D: Single image to 3D using cross-domain diffusion. arXiv preprint arXiv:2310.15008 (2023).
- The contextual loss for image transformation with non-aligned data. In Proceedings of the European conference on computer vision (ECCV). 768–783.
- RealFusion: 360 Reconstruction of Any Object from a Single Image. In Conference on Computer Vision and Pattern Recognition (CVPR).
- Latent-NeRF for shape-guided generation of 3D shapes and textures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12663–12673.
- Polygen: An autoregressive generative model of 3D meshes. In International conference on machine learning. PMLR, 7220–7229.
- Point-E: A System for Generating 3D Point Clouds from Complex Prompts. https://arxiv.org/abs/2212.08751 (2023).
- Michael Niemeyer and Andreas Geiger. 2021. Giraffe: Representing scenes as compositional generative neural feature fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11453–11464.
- Learning generative models of textured 3D meshes from real-world images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13879–13889.
- DreamFusion: Text-to-3D using 2D Diffusion. In International Conference on Learning Representations (ICLR).
- Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors. https://arxiv.org/abs/2306.17843 (2023).
- Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
- Dreambooth3D: Subject-driven text-to-3D generation. arXiv preprint arXiv:2303.13508 (2023).
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
- Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena 60, 1-4 (1992), 259–268.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22500–22510.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems 35 (2022), 36479–36494.
- Componet: Learning to generate the unseen by part synthesis and composition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8759–8768.
- SinGAN: Learning a generative model from a single natural image. In Proceedings of the IEEE/CVF international conference on computer vision. 4570–4580.
- Deep marching tetrahedra: a hybrid representation for high-resolution 3D shape synthesis. Advances in Neural Information Processing Systems 34 (2021), 6087–6101.
- DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior. https://arxiv.org/abs/2310.16818 (2023).
- LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation. arXiv preprint arXiv:2402.05054 (2024).
- DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation. arXiv:2309.16653Â [cs.CV]
- Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior. In International Conference on Computer Vision ICCV.
- RODIN: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2023).
- Pretraining is All You Need for Image-to-Image Translation. In arXiv.
- ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation. https://arxiv.org/abs/2305.16213 (2023).
- Sin3DM: Learning a Diffusion Model from a Single 3D Textured Shape. arXiv preprint arXiv:2305.15399 (2023).
- Rundi Wu and Changxi Zheng. 2022. Learning to generate 3D shapes from a single example. arXiv preprint arXiv:2208.02946 (2022).
- Fit and diverse: Set evolution for inspiring 3D shape galleries. ACM Transactions on Graphics (TOG) 31, 4 (2012), 1–10.
- MVDream: Multi-view Diffusion for 3D Generation. https://arxiv.org/abs/2308.16512 (2023).
- Smart variations: Functional substructures for part compatibility. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 195–204.
- 3D shape generation and completion through point-voxel diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5826–5835.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.