Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OmniSVG: A Unified Scalable Vector Graphics Generation Model (2504.06263v2)

Published 8 Apr 2025 in cs.CV

Abstract: Scalable Vector Graphics (SVG) is an important image format widely adopted in graphic design because of their resolution independence and editability. The study of generating high-quality SVG has continuously drawn attention from both designers and researchers in the AIGC community. However, existing methods either produces unstructured outputs with huge computational cost or is limited to generating monochrome icons of over-simplified structures. To produce high-quality and complex SVG, we propose OmniSVG, a unified framework that leverages pre-trained Vision-LLMs (VLMs) for end-to-end multimodal SVG generation. By parameterizing SVG commands and coordinates into discrete tokens, OmniSVG decouples structural logic from low-level geometry for efficient training while maintaining the expressiveness of complex SVG structure. To further advance the development of SVG synthesis, we introduce MMSVG-2M, a multimodal dataset with two million richly annotated SVG assets, along with a standardized evaluation protocol for conditional SVG generation tasks. Extensive experiments show that OmniSVG outperforms existing methods and demonstrates its potential for integration into professional SVG design workflows.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Yiying Yang (15 papers)
  2. Wei Cheng (175 papers)
  3. Sijin Chen (12 papers)
  4. Xianfang Zeng (24 papers)
  5. Jiaxu Zhang (12 papers)
  6. Liao Wang (11 papers)
  7. Gang Yu (114 papers)
  8. Xingjun Ma (114 papers)
  9. Yu-Gang Jiang (223 papers)
  10. Fukun Yin (11 papers)

Summary

OmniSVG: Enhancements in Scalable Vector Graphics Generation

The paper introduces OmniSVG, a comprehensive framework for generating Scalable Vector Graphics (SVG) that builds upon pre-trained Vision-LLMs (VLMs). The paper highlights the burgeoning domain of SVG generation, emphasizing the enhancements brought about by OmniSVG. SVGs, known for their resolution independence and editability, are foundational elements in modern digital design, yet generating and manipulating them has presented inherent challenges.

Key Contributions

  1. Unified Framework: OmniSVG utilizes pre-trained VLMs to innovate SVG generation, surmounting challenges linked to existing methods. By converting SVG commands and coordinates into discrete tokens, OmniSVG decouples complex structural logic from low-level geometry, maintaining expressiveness while enhancing training efficiency. This approach mitigates the "coordinate hallucination" problem often found in LLM-produced code, ultimately crafting vivid and intricate SVGs.
  2. Multimodal Dataset: The authors present MMSVG-2M, a vast dataset comprising two million richly annotated SVG assets. In addition, they standardize an evaluation protocol for various SVG generation tasks, effectively establishing a baseline for future research.
  3. Autoregressive Model Enhancement: The approach leverages an autoregressive model adept at completing SVG tasks when provided with partial observations. It integrates visual and textual instructions to synthesize editable, high-fidelity SVGs across diverse domains, from simple icons to elaborate anime illustrations.

Experimental Findings

The experimental results illustrate that OmniSVG outperforms existing baselines in both quantitative and qualitative metrics. For text-to-SVG conversion, OmniSVG shows improvements in FID, CLIP, and Aesthetic scores, highlighting its superior ability to generate SVGs that are true to the textual description in terms of both semantic content and aesthetic fulfiLLMent. This is achieved while maintaining computational efficiency through reduced token usage.

In image-conditioned SVG creation, OmniSVG achieves high DINO and SSIM scores, demonstrating the effectiveness of its multimodal generative models in producing SVGs that closely align with the visual input.

Implications and Future Directions

The implications of OmniSVG’s innovations stretch across multiple domains of AI and digital design. Incorporating pre-trained VLMs into SVG generation provides a significant leap in melding text and visual understanding capabilities. OmniSVG not only bolsters the generation of complex vector graphics but also sets a precedent for integrating AI-driven SVG design workflows into professional environments.

Future developments could involve refining the VLM architectures to further decrease SVG generation times and enhance scalability. There are promising avenues for utilizing multi-token prediction and KV-cache compression to make generative processes even more efficient. Additionally, the paper indicates potential for future exploration in areas such as in-context learning and multi-turn interleaved generation, which could offer users more versatility and control.

In summation, OmniSVG marks a pivotal progression in SVG generation, aligning AI capabilities with human creative intent and operational benchmarks, thus unlocking new potentials in digital artistry and interactive design applications.

Youtube Logo Streamline Icon: https://streamlinehq.com