Emergent Mind

Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation

Published Apr 2, 2024 in cs.CV


Recently, image-to-3D approaches have achieved significant results with a natural image as input. However, it is not always possible to access these enriched color input samples in practical applications, where only sketches are available. Existing sketch-to-3D researches suffer from limitations in broad applications due to the challenges of lacking color information and multi-view content. To overcome them, this paper proposes a novel generation paradigm Sketch3D to generate realistic 3D assets with shape aligned with the input sketch and color matching the textual description. Concretely, Sketch3D first instantiates the given sketch in the reference image through the shape-preserving generation process. Second, the reference image is leveraged to deduce a coarse 3D Gaussian prior, and multi-view style-consistent guidance images are generated based on the renderings of the 3D Gaussians. Finally, three strategies are designed to optimize 3D Gaussians, i.e., structural optimization via a distribution transfer mechanism, color optimization with a straightforward MSE loss and sketch similarity optimization with a CLIP-based geometric similarity loss. Extensive visual comparisons and quantitative analysis illustrate the advantage of our Sketch3D in generating realistic 3D assets while preserving consistency with the input.


  • The ACM has developed a comprehensive template for its publications to ensure consistency and readability, introduced in 2017.

  • The template supports various publication stages and includes specific styles and parameters to cater to the different needs of ACM's publications.

  • Strict prohibitions are in place against modifying template elements such as margins and typeface sizes to maintain the integrity of ACM publications.

  • Future developments in publishing may include sophisticated templates that enhance accessibility and interactivity, highlighting the importance of standardized templates.


The ACM has developed a single, comprehensive template to ensure consistency and readability across its publications. This detailed document provides an extensive overview of the ACM's consolidated article template, introduced in 2017. The template serves multiple functions from formatting to facilitating metadata extraction and accessibility - crucial for the future integration into the ACM Digital Library. The flexibility embedded in the design allows authors to prepare documents for various stages of publication, from submissions for review to camera-ready copies, across both conference proceedings and journal publications.

Templating Nuances

Template Styles and Parameters

The article outlines the differential template styles (acmsmall, acmlarge, acmtog, acmconf, sigchi, sigchi-a, sigplan) designed to accommodate the diverse requirements of ACM's publications, including journals and conference proceedings. Each style is chosen based on the nature of the publication and the specific SIG governing the work. Furthermore, it discusses template parameters like anonymous, review, authorversion, and screen, which adjust the template style to suit various publication stages and requirements, such as dual-anonymous conference submissions or generating screen-friendly versions.

Prohibited Modifications

A significant emphasis is placed on the strict prohibition against modifying the template. This includes altering fundamental elements such as margins, typeface sizes, and the usage of commands to manage vertical spacing. These restrictions are enforced to maintain the integrity and uniformity of ACM publications.

Typeface Requirements

The document stresses the mandatory use of the "Libertine" typeface family, barring substitutions to maintain a standard visual aesthetic across publications. The directive serves to unify the appearance of ACM works, contributing to a cohesive brand identity.

Title, Authors, and Affiliation Guidelines

Authors are advised on how to appropriately format titles, manage author information, and specify affiliations to ensure clarity and accuracy in the metadata. Precise instructions for handling long titles, multiple authors sharing affiliations, and the necessity of including e-mail addresses are provided to optimize the metadata extraction process.

Rights Information and CCS Concepts

The necessity of including rights management information and the use of the ACM Computing Classification System (CCS) for taxonomic classification of the work is discussed. These components are vital for the legal and academic categorization and discoverability of the articles within the ACM ecosystem and beyond.

Formatting and Content Structure

The document extends detailed guidance on structuring the content, including adherence to standard LaTeX sectioning commands and the preparation of tables, math equations, and figures. Particular attention is given to the formatting and placement of tables and figures to enhance readability and accessibility. The imperative of providing accurate figure descriptions is highlighted to facilitate content comprehension for visually impaired readers and improve search engine optimization.

Citations, Acknowledgments, and Appendices

Clear instructions are given on the preparation of bibliographies using BibTeX, ensuring completeness and accuracy in citations. Guidelines for acknowledging contributions and support are also provided, demarcating a specific acks environment for this section. Lastly, the document delineates how to incorporate appendices effectively, ensuring they are correctly sectioned and integrated into the article.

Implications and Future Directions

The establishment of uniform formatting guidelines by the ACM plays a crucial role in the standardization of academic publications in the computing field. By enforcing a consistent structure and visual presentation, these guidelines not only enhance the readability and accessibility of research but also streamline the publication process. Looking ahead, as AI and automated tools become increasingly prevalent in research and publication workflows, the importance of standardized templates and metadata becomes even more pronounced. Future developments in this area may include more sophisticated templates that further ease the publication process while maintaining high standards of accessibility and interactivity. The continuous evolution of these guidelines will likely parallel advances in publishing technologies, with a sustained focus on improving the accessibility, discoverability, and usability of scholarly communications.

In summary, the ACM's consolidation effort in article templating showcases a forward-thinking approach to academic publishing - one that respects the traditions of scholarly communication while embracing the technological advancements that shape its future.

Get summaries of trending AI papers delivered straight to your inbox

Unsubscribe anytime.

  1. Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into 3D, alleviate Janus problem and Beyond
  2. MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing
  3. ShapeNet: An Information-Rich 3D Model Repository
  4. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 22246–22256.
  5. Control3d: Towards controllable text-to-3d generation. In Proceedings of the 31st ACM International Conference on Multimedia. 1148–1156.
  6. SketchSampler: Sketch-Based 3D Reconstruction via View-Dependent Depth Sampling. In European Conference on Computer Vision. Springer, 464–479.
  7. Get3d: A generative model of high quality 3d textured shapes learned from images. Advances In Neural Information Processing Systems 35 (2022), 31841–31854.
  8. Sketch2mesh: Reconstructing and editing 3d shapes from sketches. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13023–13032.
  9. 3DGen: Triplane Latent Diffusion for Textured Mesh Generation
  10. Customize-It-3D: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior
  11. DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation
  12. Shap-E: Generating Conditional 3D Implicit Functions
  13. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics 42, 4 (2023).
  14. A Diffusion-ReFinement Model for Sketch-to-Point Modeling. In Proceedings of the Asian Conference on Computer Vision. 1522–1538.
  15. Controllable text-to-image generation. Advances in Neural Information Processing Systems 32 (2019).
  16. Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era
  17. Gligen: Open-set grounded text-to-image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22511–22521.
  18. Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 300–309.
  19. SketchFaceNeRF: Sketch-based facial generation and editing in neural radiance fields. ACM Transactions on Graphics (2023).
  20. Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior
  21. One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion
  22. One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization
  23. Zero-1-to-3: Zero-shot one image to 3d object. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9298–9309.
  24. ATT3D: Amortized Text-to-3D Object Synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 17946–17956.
  25. 3d shape reconstruction from sketches via multi-view convolutional networks. In 2017 International Conference on 3D Vision (3DV). IEEE, 67–77.
  26. Sked: Sketch-guided text-based 3d editing. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14607–14619.
  27. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99–106.
  28. T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models
  29. Point-E: A System for Generating 3D Point Clouds from Complex Prompts
  30. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. In International Conference on Machine Learning. PMLR, 16784–16804.
  31. AutoDecoding Latent 3D Diffusion Models
  32. DreamFusion: Text-to-3D using 2D Diffusion
  33. PersonalTailor: Personalizing 2D Pattern Design from 3D Garment Point Clouds
  34. Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors
  35. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
  36. Dreambooth3d: Subject-driven text-to-3d generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2349–2359.
  37. Hierarchical Text-Conditional Image Generation with CLIP Latents
  38. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
  39. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems 35 (2022), 36479–36494.
  40. Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation
  41. Zero-shot multi-modal artist-controlled retrieval and exploration of 3d object sets. In SIGGRAPH Asia 2022 Technical Communications. Association for Computing Machinery, 1–4.
  42. Text-to-4d dynamic scene generation. In Proceedings of the 40th International Conference on Machine Learning. PMLR, 31915–31929.
  43. Stability AI. 2023. Stable Zero123: Quality 3D Object Generation from Single Images. https://stability.ai/news/stable-zero123-3d-generation Online; accessed 13 December 2023.

  44. DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation
  45. Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV). 22762–22772.
  46. TextMesh: Generation of Realistic 3D Meshes From Text Prompts
  47. Clipasso: Semantically-aware object sketching. ACM Transactions on Graphics (TOG) 41, 4 (2022), 1–11.
  48. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12619–12629.
  49. ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation
  50. Sketch and text guided diffusion model for colored point cloud generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8929–8939.
  51. Haifeng Xia and Zhengming Ding. 2020. Structure preserving generative cross-domain learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4364–4373.
  52. Maximum structural generation discrepancy for unsupervised domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 3 (2022), 3434–3445.
  53. IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
  54. GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models
  55. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4578–4587.
  56. Points-to-3d: Bridging the gap between sparse points and shape-controllable text-to-3d generation. In Proceedings of the 31st ACM International Conference on Multimedia. 6841–6850.
  57. 3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models. ACM Trans. Graph. (2023).
  58. Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting
  59. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3836–3847.
  60. Sketch2model: View-aware 3d modeling from single free-hand sketches. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6012–6021.
  61. Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
  62. Locally Attentional SDF Diffusion for Controllable 3D Shape Generation
  63. Locally attentional sdf diffusion for controllable 3d shape generation. ACM Transactions on Graphics (TOG) 42, 4 (2023), 1–13.
  64. HiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion Guidance
  65. Dreameditor: Text-driven 3d scene editing with neural fields. In SIGGRAPH Asia 2023 Conference Papers. 1–10.

Show All 65

Test Your Knowledge

You answered out of questions correctly.

Well done!