Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 56 tok/s
Gemini 2.5 Pro 38 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 22 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 182 tok/s Pro
GPT OSS 120B 420 tok/s Pro
Claude Sonnet 4.5 30 tok/s Pro
2000 character limit reached

ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars (2403.15383v2)

Published 22 Mar 2024 in cs.CV

Abstract: Real-world applications often require a large gallery of 3D assets that share a consistent theme. While remarkable advances have been made in general 3D content creation from text or image, synthesizing customized 3D assets following the shared theme of input 3D exemplars remains an open and challenging problem. In this work, we present ThemeStation, a novel approach for theme-aware 3D-to-3D generation. ThemeStation synthesizes customized 3D assets based on given few exemplars with two goals: 1) unity for generating 3D assets that thematically align with the given exemplars and 2) diversity for generating 3D assets with a high degree of variations. To this end, we design a two-stage framework that draws a concept image first, followed by a reference-informed 3D modeling stage. We propose a novel dual score distillation (DSD) loss to jointly leverage priors from both the input exemplars and the synthesized concept image. Extensive experiments and user studies confirm that ThemeStation surpasses prior works in producing diverse theme-aware 3D models with impressive quality. ThemeStation also enables various applications such as controllable 3D-to-3D generation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Break-A-Scene: Extracting Multiple Concepts from a Single Image. arXiv preprint arXiv:2305.16311 (2023).
  2. Bob. 2022. 3D Modeling 101: Comprehensive Beginners Guide. Retrieved Jan 03, 2024 from https://wow-how.com/articles/3d-modeling-101-comprehensive-beginners-guide
  3. Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018).
  4. Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18392–18402.
  5. CGHero. 2022. The Stages of Creating a 3D Model. Retrieved Jan 02, 2024 from https://cghero.com/articles/stages-of-creating-3d-model
  6. Efficient geometry-aware 3D generative adversarial networks. In CVPR.
  7. Probabilistic reasoning for assembly-based 3D modeling. In ACM SIGGRAPH 2011 papers. 1–10.
  8. Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction. arXiv preprint arXiv:2304.06714 (2023).
  9. Fantasia3D: Disentangling geometry and appearance for high-quality text-to-3D content creation. arXiv preprint arXiv:2303.13873 (2023).
  10. ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance.
  11. Victor Dibia. 2022. Latent Diffusion Models: Components and Denoising Steps. Retrieved Jan 04, 2024 from https://victordibia.com/blog/stable-diffusion-denoising/
  12. HyperDiffusion: Generating Implicit Neural Fields with Weight-Space Diffusion.
  13. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion.
  14. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618 (2022).
  15. Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2414–2423.
  16. 3DGen: Triplane latent diffusion for textured mesh generation. arXiv preprint arXiv:2303.05371 (2023).
  17. Zexin He and Tengfei Wang. 2023. OpenLRM: Open-Source Large Reconstruction Models. https://github.com/3DTopia/OpenLRM.
  18. Prompt-to-prompt image editing with cross attention control. (2022).
  19. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851.
  20. 3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors. arXiv preprint arXiv:2403.02234 (2024).
  21. Lrm: Large reconstruction model for single image to 3D. arXiv preprint arXiv:2311.04400 (2023).
  22. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
  23. Heewoo Jun and Alex Nichol. 2023. Shap-e: Generating conditional 3D implicit functions. arXiv preprint arXiv:2305.02463 (2023).
  24. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401–4410.
  25. Learning part-based templates from large collections of 3D shapes. ACM Transactions on Graphics (TOG) 32, 4 (2013), 1–12.
  26. Patch-based 3D Natural Scene Generation from a Single Example. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16762–16772.
  27. Magic3D: High-Resolution Text-to-3D Content Creation. In Conference on Computer Vision and Pattern Recognition (CVPR).
  28. One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion. arXiv preprint arXiv:2311.07885 (2023).
  29. Zero-1-to-3: Zero-shot one image to 3D object. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9298–9309.
  30. SyncDreamer: Generating Multiview-consistent Images from a Single-view Image. arXiv preprint arXiv:2309.03453 (2023).
  31. Wonder3D: Single image to 3D using cross-domain diffusion. arXiv preprint arXiv:2310.15008 (2023).
  32. The contextual loss for image transformation with non-aligned data. In Proceedings of the European conference on computer vision (ECCV). 768–783.
  33. RealFusion: 360 Reconstruction of Any Object from a Single Image. In Conference on Computer Vision and Pattern Recognition (CVPR).
  34. Latent-NeRF for shape-guided generation of 3D shapes and textures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12663–12673.
  35. Polygen: An autoregressive generative model of 3D meshes. In International conference on machine learning. PMLR, 7220–7229.
  36. Point-E: A System for Generating 3D Point Clouds from Complex Prompts. https://arxiv.org/abs/2212.08751 (2023).
  37. Michael Niemeyer and Andreas Geiger. 2021. Giraffe: Representing scenes as compositional generative neural feature fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11453–11464.
  38. Learning generative models of textured 3D meshes from real-world images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13879–13889.
  39. DreamFusion: Text-to-3D using 2D Diffusion. In International Conference on Learning Representations (ICLR).
  40. Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors. https://arxiv.org/abs/2306.17843 (2023).
  41. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
  42. Dreambooth3D: Subject-driven text-to-3D generation. arXiv preprint arXiv:2303.13508 (2023).
  43. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
  44. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena 60, 1-4 (1992), 259–268.
  45. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22500–22510.
  46. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems 35 (2022), 36479–36494.
  47. Componet: Learning to generate the unseen by part synthesis and composition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8759–8768.
  48. SinGAN: Learning a generative model from a single natural image. In Proceedings of the IEEE/CVF international conference on computer vision. 4570–4580.
  49. Deep marching tetrahedra: a hybrid representation for high-resolution 3D shape synthesis. Advances in Neural Information Processing Systems 34 (2021), 6087–6101.
  50. DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior. https://arxiv.org/abs/2310.16818 (2023).
  51. LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation. arXiv preprint arXiv:2402.05054 (2024).
  52. DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation. arXiv:2309.16653 [cs.CV]
  53. Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior. In International Conference on Computer Vision ICCV.
  54. RODIN: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2023).
  55. Pretraining is All You Need for Image-to-Image Translation. In arXiv.
  56. ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation. https://arxiv.org/abs/2305.16213 (2023).
  57. Sin3DM: Learning a Diffusion Model from a Single 3D Textured Shape. arXiv preprint arXiv:2305.15399 (2023).
  58. Rundi Wu and Changxi Zheng. 2022. Learning to generate 3D shapes from a single example. arXiv preprint arXiv:2208.02946 (2022).
  59. Fit and diverse: Set evolution for inspiring 3D shape galleries. ACM Transactions on Graphics (TOG) 31, 4 (2012), 1–10.
  60. MVDream: Multi-view Diffusion for 3D Generation. https://arxiv.org/abs/2308.16512 (2023).
  61. Smart variations: Functional substructures for part compatibility. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 195–204.
  62. 3D shape generation and completion through point-voxel diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5826–5835.
Citations (5)

Summary

  • The paper introduces a two-stage 3D asset generation framework that first synthesizes a concept image before constructing the 3D model.
  • It employs dual score distillation loss to integrate guidance from both 3D exemplars and concept images for improved asset quality.
  • Experimental results demonstrate enhanced diversity and thematic consistency, underscoring its applicability in gaming, film, and VR.

Theme-Aware 3D Asset Generation from Few Exemplars with ThemeStation

Introduction to ThemeStation

The task of generating theme-consistent 3D assets has been a challenging problem in computer graphics and AI research. Despite the progress in 3D content creation, generating customized 3D models that align with a set theme, particularly from a limited set of exemplars, requires an innovative approach. ThemeStation addresses this issue by proposing a novel two-stage framework for theme-aware 3D-to-3D generation. It capitalizes on the concept of dual score distillation (DSD) loss to effectively manage prior information from both input 3D exemplars and a synthesized concept image. This approach leads to the creation of diverse and theme-consistent 3D models, setting a new direction in automated 3D asset generation.

Key Contributions

The main contributions of ThemeStation include:

  • A Novel Two-Stage Framework: ThemeStation introduces a unique approach to generating theme-consistent 3D models by first synthesizing a concept image and then transforming this image into a 3D model, incorporating both unity and diversity in the generated assets.
  • Dual Score Distillation (DSD) Loss: A new loss function, DSD, is proposed to leverage the priors from the input exemplars and the concept image efficiently, mitigating conflicts between these two sources of guidance during the 3D modeling process.
  • Theme-Aware 3D Generation: The paper addresses the challenge of theme-aware 3D-to-3D generation, a relatively unexplored area, demonstrating the potential to expand the capabilities of generative models in creating coherent sets of 3D assets.

Methodology Overview

ThemeStation processes the generation task in two primary stages:

  1. Theme-Driven Concept Image Generation: Leveraging a pre-trained text-to-image diffusion model, ThemeStation customizes this model to produce various concept images that embody the theme carried by a set of 3D exemplars. This stage ensures that the generation process is anchored by the thematic essence of the input.
  2. Reference-Informed 3D Asset Modeling: Utilizing the theme-infused concept images and reference 3D exemplars, ThemeStation synthesizes detailed 3D models. Applying the DSD loss allows the model to draw from both the global thematic layout provided by the concept image and the detailed features captured in the 3D references.

Experimental Insights

Through extensive experimentation and a user paper, ThemeStation was found to outperform existing approaches in creating diverse and detailed theme-aware 3D models. Key observations from the paper include:

  • Superior Quality and Diversity: ThemeStation's generated models exhibit greater thematic consistency, detail, and variation compared to models generated by other techniques.
  • Effective Use of Dual Priors: The DSD loss effectively resolves conflicts between the guidance provided by the concept images and the 3D references, leading to improved generation quality.

Theoretical and Practical Implications

The introduction of ThemeStation has both theoretical and practical ramifications for the field of computer graphics and AI-driven content creation:

  • New Horizons in Generative AI: The research presents a novel method for leveraging diffusion models and dual score distillation in theme-aware generation tasks, expanding our understanding of these models' utility and adaptability.
  • Broad Applicability: The ability to generate theme-consistent 3D assets efficiently has significant implications for industries such as gaming, film production, and virtual reality, where cohesive thematic design is crucial.

Future Directions

While ThemeStation marks a significant advancement, it also opens up avenues for further research. Potential directions include exploring:

  • Improved Efficiency and Scalability: Techniques to reduce the computational demands of the two-stage generation process and to scale up the generation to even larger sets of 3D assets.
  • Extended Theme Interpretation: Developing mechanisms for the automatic interpretation and application of broader and more abstract themes in the generation of 3D assets.

Conclusion

ThemeStation offers a pioneering approach to the generation of theme-consistent 3D assets from a limited number of exemplars, combining conceptual innovation with practical applicability. Its success in generating diverse and detailed 3D models aligned with specified themes represents a significant step forward in the domain of generative AI and 3D content creation.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 5 posts and received 384 likes.

Youtube Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube