DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation (2007.11301v3)

Published 22 Jul 2020 in cs.CV

Abstract: Scalable Vector Graphics (SVG) are ubiquitous in modern 2D interfaces due to their ability to scale to different resolutions. However, despite the success of deep learning-based models applied to rasterized images, the problem of vector graphics representation learning and generation remains largely unexplored. In this work, we propose a novel hierarchical generative network, called DeepSVG, for complex SVG icons generation and interpolation. Our architecture effectively disentangles high-level shapes from the low-level commands that encode the shape itself. The network directly predicts a set of shapes in a non-autoregressive fashion. We introduce the task of complex SVG icons generation by releasing a new large-scale dataset along with an open-source library for SVG manipulation. We demonstrate that our network learns to accurately reconstruct diverse vector graphics, and can serve as a powerful animation tool by performing interpolations and other latent space operations. Our code is available at https://github.com/alexandre01/deepsvg.

PDF Abstract

DeepSVG: Advancements in Vector Graphics Representation and Generation

Scalable Vector Graphics (SVG) are essential for creating resolution-independent digital assets, widely used in interfaces and animations. The paper "DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation" addresses the relatively unexplored domain of vector graphics generation using deep learning, proposing the novel DeepSVG model specifically designed for generating complex SVG icons.

Hierarchical Generative Model

The DeepSVG model introduces a hierarchical Transformer-based architecture that exploits the inherent structure of SVG images. SVGs are collections of paths, each defined by sequences of commands like lines and Bézier curves. Unlike raster images, vector graphics require handling permutation invariance due to the arbitrary order of shapes. DeepSVG effectively disentangles high-level shape representations from low-level draw commands using a two-stage approach in both encoding and decoding processes. This is done by individually encoding each path before capturing inter-path relations to form a latent representation, which is later decoded non-autoregressively.

Key Contributions and Results

The paper offers three significant contributions:

DeepSVG Model: A hierarchical approach enhances the ability to encode and generate complex vector images. It leverages Transformers to learn deep representations capable of producing diverse and high-quality SVGs.
SVG-Icons8 Dataset: The authors introduce a substantial dataset containing 100,000 SVG icons across 56 categories, addressing the limitations of existing datasets by focusing on consistency and diversity in real-world graphics.
Experimental Validation: Comprehensive experiments demonstrate DeepSVG’s prowess in interpolation and manipulation of vector graphics. The model achieves notable improvements in reconstruction error and interpolation smoothness compared to autoregressive counterparts.

The DeepSVG model achieves smoother interpolations and accurate reconstructions, evidenced by a 44.8% first-rank preference in human studies for interpolation tasks, outperforming other models. Its architectural design enables the generation of visually consistent fonts and complex icons directly from latent representations, revealing the model's potential as a versatile tool for digital artists.

Implications and Future Directions

The insights from this research have profound practical implications, potentially transforming creative workflows in digital art and animation by automating the generation and manipulation of vector graphics. The introduced dataset and open-source code provide a foundational benchmark for subsequent research.

For theoretical advancements, integrating such hierarchical architectures into other domains like audio or motion trajectory generation could lead to notable breakthroughs, given the compatibility of sequence-based data representations.

Further research can explore extending these methods to account for more global SVG attributes like color and stroke width, enhancing the model's utility. Improvements in interpolation stability and exploring latent space operations such as style transfer or SVG vectorization remain promising areas for future exploration.

In conclusion, DeepSVG marks a significant step forward in applying deep learning to vector graphics, combining technical innovation in architecture with practical applications, setting a precedent for future research in the field.