Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SVGDreamer: Text Guided SVG Generation with Diffusion Model (2312.16476v6)

Published 27 Dec 2023 in cs.CV and cs.AI

Abstract: Recently, text-guided scalable vector graphics (SVGs) synthesis has shown promise in domains such as iconography and sketch. However, existing text-to-SVG generation methods lack editability and struggle with visual quality and result diversity. To address these limitations, we propose a novel text-guided vector graphics synthesis method called SVGDreamer. SVGDreamer incorporates a semantic-driven image vectorization (SIVE) process that enables the decomposition of synthesis into foreground objects and background, thereby enhancing editability. Specifically, the SIVE process introduces attention-based primitive control and an attention-mask loss function for effective control and manipulation of individual elements. Additionally, we propose a Vectorized Particle-based Score Distillation (VPSD) approach to address issues of shape over-smoothing, color over-saturation, limited diversity, and slow convergence of the existing text-to-SVG generation methods by modeling SVGs as distributions of control points and colors. Furthermore, VPSD leverages a reward model to re-weight vector particles, which improves aesthetic appeal and accelerates convergence. Extensive experiments are conducted to validate the effectiveness of SVGDreamer, demonstrating its superiority over baseline methods in terms of editability, visual quality, and diversity. Project page: https://ximinng.github.io/SVGDreamer-project/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Deepsvg: A hierarchical generative network for vector graphics animation. Advances in Neural Information Processing Systems (NIPS), 33:16351–16361, 2020.
  2. Textdiffuser: Diffusion models as text painters. arXiv preprint arXiv:2305.10855, 2023.
  3. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (NIPS), pages 12873–12883, 2021.
  4. CLIPDraw: Exploring text-to-drawing synthesis through language-image encoders. In Advances in Neural Information Processing Systems (NIPS), 2022.
  5. A neural representation of sketch drawings. In International Conference on Learning Representations (ICLR), 2018.
  6. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems (NIPS), 30, 2017.
  7. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  8. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems (NIPS), pages 6840–6851, 2020.
  9. Image quality metrics: Psnr vs. ssim. In 2010 20th International Conference on Pattern Recognition, pages 2366–2369, 2010.
  10. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations (ICLR), 2022.
  11. Word-as-image for semantic typography. ACM Transactions on Graphics (TOG), 42(4), 2023.
  12. Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  13. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning (ICML), pages 12888–12900. PMLR, 2022.
  14. Differentiable vector graphics rasterization for editing and learning. ACM Transactions on Graphics (TOG), 39(6):193:1–193:15, 2020.
  15. Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 300–309, 2023.
  16. A learned representation for scalable vector graphics. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
  17. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems (NIPS), 35:5775–5787, 2022.
  18. Towards layer-wise image vectorization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16314–16323, 2022.
  19. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  20. Clip-clop: Clip-guided collage and photomontage. arXiv preprint arXiv:2205.03146, 2022.
  21. GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models. In Proceedings of the 39th International Conference on Machine Learning (ICML), pages 16784–16804, 2022.
  22. Do 2d {gan}s know 3d shape? unsupervised 3d shape reconstruction from 2d image {gan}s. In International Conference on Learning Representations (ICLR), 2021.
  23. Dreamfusion: Text-to-3d using 2d diffusion. In The Eleventh International Conference on Learning Representations (ICLR), 2023.
  24. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML), pages 8748–8763. PMLR, 2021.
  25. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  26. Im2vec: Synthesizing vector graphics without vector supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7342–7351, 2021.
  27. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022.
  28. Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems (NIPS), pages 36479–36494, 2022.
  29. Styleclipdraw: Coupling content and style in text-to-drawing synthesis. arXiv preprint arXiv:2111.03133, 2022.
  30. Christoph Schuhmann. Improved aesthetic predictor. https://github.com/christophschuhmann/improved-aesthetic-predictor, 2022.
  31. Clipgen: A deep generative model for clipart vectorization and synthesis. IEEE Transactions on Visualization and Computer Graphics, 28(12):4211–4224, 2022.
  32. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the International Conference on Machine Learning (ICML), pages 2256–2265, 2015.
  33. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems (NIPS), 2019.
  34. Clipfont: Text guided vector wordart generation. In 33rd British Machine Vision Conference 2022, BMVC 2022, London, UK, November 21-24, 2022, 2022.
  35. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations (ICLR), 2021.
  36. Clipvg: Text-guided image manipulation using differentiable vector graphics. In Proceedings of the Conference on Artificial Intelligence (AAAI), 2023.
  37. StabilityAI. If by deepfloyd lab at stabilityai. https://github.com/deep-floyd/IF, 2023.
  38. Marvel: Raster gray-level manga vectorization via primitive-wise deep reinforcement learning. IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT), 2023.
  39. Modern evolution strategies for creativity: Fitting concrete images and abstract concepts. In Artificial Intelligence in Music, Sound, Art and Design, pages 275–291. Springer, 2022.
  40. Clipasso: Semantically-aware object sketching. ACM Transactions on Graphics (TOG), 41(4):1–11, 2022.
  41. Clipascene: Scene sketching with different types and levels of abstraction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4146–4156, 2023.
  42. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12619–12629, 2023a.
  43. Deepvecfont: Synthesizing high-quality vector fonts via dual-modality learning. ACM Transactions on Graphics (TOG), 40(6), 2021.
  44. Aesthetic text logo synthesis via content-aware layout inferring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  45. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023b.
  46. Iconshop: Text-based vector icon synthesis with autoregressive transformers. arXiv preprint arXiv:2304.14400, 2023a.
  47. Human preference score: Better aligning text-to-image models with human preference. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2096–2105, 2023b.
  48. Diffsketcher: Text guided vector sketch synthesis through latent diffusion models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  49. Imagereward: Learning and evaluating human preferences for text-to-image generation, 2023.
  50. Glyphcontrol: Glyph conditional control for visual text generation. 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ximing Xing (8 papers)
  2. Haitao Zhou (11 papers)
  3. Chuang Wang (36 papers)
  4. Jing Zhang (730 papers)
  5. Dong Xu (167 papers)
  6. Qian Yu (116 papers)
Citations (17)