Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Text-to-Vector Generation with Neural Path Representation (2405.10317v2)

Published 16 May 2024 in cs.CV and cs.GR

Abstract: Vector graphics are widely used in digital art and highly favored by designers due to their scalability and layer-wise properties. However, the process of creating and editing vector graphics requires creativity and design expertise, making it a time-consuming task. Recent advancements in text-to-vector (T2V) generation have aimed to make this process more accessible. However, existing T2V methods directly optimize control points of vector graphics paths, often resulting in intersecting or jagged paths due to the lack of geometry constraints. To overcome these limitations, we propose a novel neural path representation by designing a dual-branch Variational Autoencoder (VAE) that learns the path latent space from both sequence and image modalities. By optimizing the combination of neural paths, we can incorporate geometric constraints while preserving expressivity in generated SVGs. Furthermore, we introduce a two-stage path optimization method to improve the visual and topological quality of generated SVGs. In the first stage, a pre-trained text-to-image diffusion model guides the initial generation of complex vector graphics through the Variational Score Distillation (VSD) process. In the second stage, we refine the graphics using a layer-wise image vectorization strategy to achieve clearer elements and structure. We demonstrate the effectiveness of our method through extensive experiments and showcase various applications. The project page is https://intchous.github.io/T2V-NPR.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. SVGformer: Representation Learning for Continuous Vector Graphics using Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10093–10102.
  2. Deepsvg: A hierarchical generative network for vector graphics animation. Advances in Neural Information Processing Systems 33 (2020), 16351–16361.
  3. Editable Image Geometric Abstraction via Neural Primitive Assembly. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 23514–23523.
  4. Louis Clouâtre and Marc Demers. 2019. Figr: Few-shot image generation with reptile. arXiv preprint arXiv:1901.02199 (2019).
  5. What is in a Text-to-Image Prompt: The Potential of Stable Diffusion in Visual Arts Education. arXiv preprint arXiv:2301.01902 (2023).
  6. Polyfit: Perception-aligned vectorization of raster clip-art via intermediate polygonal fitting. ACM Transactions on Graphics (TOG) 39, 4 (2020), 77–1.
  7. Photo2clipart: Image abstraction and vectorization using layered linear gradients. ACM Transactions on Graphics (TOG) 36, 6 (2017), 1–11.
  8. Clipdraw: Exploring text-to-drawing synthesis through language-image encoders. Advances in Neural Information Processing Systems 35 (2022), 5207–5218.
  9. Breathing Life Into Sketches Using Text-to-Video Priors. arXiv preprint arXiv:2311.13608 (2023).
  10. Tokenflow: Consistent diffusion features for consistent video editing. arXiv preprint arXiv:2307.10373 (2023).
  11. David Ha and Douglas Eck. 2017. A neural representation of sketch drawings. arXiv preprint arXiv:1704.03477 (2017).
  12. Delta denoising score. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2328–2337.
  13. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851.
  14. Perception-driven semi-structured boundary vectorization. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1–14.
  15. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
  16. Adobe Illustrator. 2023. Turn ideas into illustrations with Text to Vector Graphic. https://www.adobe.com/products/illustrator/text-to-vector-graphic.html.
  17. Illustroke. 2024. Stunning vector illustrations from text prompts. https://illustroke.com/.
  18. Word-as-image for semantic typography. arXiv preprint arXiv:2303.01818 (2023).
  19. VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models. arXiv preprint arXiv:2211.11319 (2022).
  20. Kittl. 2024. AI Vector Generator. https://www.kittl.com/feature/ai-text-to-vector.
  21. Johannes Kopf and Dani Lischinski. 2011. Depixelizing pixel art. In ACM SIGGRAPH 2011 papers. 1–8.
  22. Multi-Concept Customization of Text-to-Image Diffusion. arXiv preprint arXiv:2212.04488 (2022).
  23. Differentiable vector graphics rasterization for editing and learning. ACM Transactions on Graphics (TOG) 39, 6 (2020), 1–15.
  24. End-to-end line drawing vectorization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 4559–4566.
  25. DualVector: Unsupervised Vector Font Synthesis with Dual-Part Representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14193–14202.
  26. A learned representation for scalable vector graphics. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7930–7939.
  27. Towards layer-wise image vectorization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16314–16323.
  28. Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073 (2021).
  29. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021).
  30. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988 (2022).
  31. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
  32. Im2vec: Synthesizing vector graphics without vector supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7342–7351.
  33. Sketchformer: Transformer-based representation for sketched structure. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 14153–14162.
  34. StarVector: Generating Scalable Vector Graphics Code from Images. arXiv preprint arXiv:2312.11556 (2023).
  35. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684–10695.
  36. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. arXiv preprint arXiv:2208.12242 (2022).
  37. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 Conference Proceedings. 1–10.
  38. Styleclipdraw: Coupling content and style in text-to-drawing translation. arXiv preprint arXiv:2202.12362 (2022).
  39. Peter Selinger. 2003. Potrace: a polygon-based tracing algorithm.
  40. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning. PMLR, 2256–2265.
  41. CLIPVG: Text-Guided Image Manipulation Using Differentiable Vector Graphics. arXiv preprint arXiv:2212.02122 (2022).
  42. Attention is all you need. Advances in neural information processing systems 30 (2017).
  43. Sketch-guided text-to-image diffusion models. In ACM SIGGRAPH 2023 Conference Proceedings. 1–11.
  44. Modelscope text-to-video technical report. arXiv preprint arXiv:2308.06571 (2023).
  45. Yizhi Wang and Zhouhui Lian. 2021. DeepVecFont: Synthesizing high-quality vector fonts via dual-modality learning. ACM Transactions on Graphics (TOG) 40, 6 (2021), 1–15.
  46. DeepVecFont-v2: Exploiting Transformers to Synthesize Vector Fonts with Higher Quality. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18320–18328.
  47. ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation. arXiv preprint arXiv:2305.16213 (2023).
  48. IconShop: Text-Guided Vector Icon Synthesis with Autoregressive Transformers. ACM Transactions on Graphics (TOG) 42, 6 (2023), 1–14.
  49. DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models. arXiv preprint arXiv:2306.14685 (2023).
  50. SVGDreamer: Text Guided SVG Generation with Diffusion Model. arXiv preprint arXiv:2312.16476 (2023).
  51. A benchmark for rough sketch cleanup. ACM Transactions on Graphics (TOG) 39, 6 (2020), 1–14.
  52. Effective clipart image vectorization through direct optimization of bezigons. IEEE transactions on visualization and computer graphics 22, 2 (2015), 1063–1075.
  53. Multimodal image synthesis and editing: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
  54. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3836–3847.
  55. Inversion-based style transfer with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10146–10156.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Peiying Zhang (23 papers)
  2. Nanxuan Zhao (36 papers)
  3. Jing Liao (100 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com