DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models (2306.14685v4)

Published 26 Jun 2023 in cs.CV and cs.AI

Abstract: Even though trained mainly on images, we discover that pretrained diffusion models show impressive power in guiding sketch synthesis. In this paper, we present DiffSketcher, an innovative algorithm that creates \textit{vectorized} free-hand sketches using natural language input. DiffSketcher is developed based on a pre-trained text-to-image diffusion model. It performs the task by directly optimizing a set of B\'ezier curves with an extended version of the score distillation sampling (SDS) loss, which allows us to use a raster-level diffusion model as a prior for optimizing a parametric vectorized sketch generator. Furthermore, we explore attention maps embedded in the diffusion model for effective stroke initialization to speed up the generation process. The generated sketches demonstrate multiple levels of abstraction while maintaining recognizability, underlying structure, and essential visual details of the subject drawn. Our experiments show that DiffSketcher achieves greater quality than prior work. The code and demo of DiffSketcher can be found at https://ximinng.github.io/DiffSketcher-project/.

PDF Abstract

DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models

The research paper introduces DiffSketcher, an approach designed to create vectorized sketches from natural language descriptions. Unlike traditional methods that rely on complex datasets or supervised learning, DiffSketcher leverages pre-trained text-to-image diffusion models to facilitate sketch synthesis without the need for extensive training pairs. This novel technique exploits latent diffusion models for efficient text-to-sketch transformation, optimizing Bézier curve parameters to generate abstract yet recognizable vector sketches.

The methodology underpinning DiffSketcher is rooted in the adaptation of score distillation sampling (SDS) loss, allowing raster-level diffusion models to optimize parametric vector sketches. By utilizing attention maps from diffusion models, the algorithm achieves efficient stroke initialization, improving the generation process's quality and speed. This process ensures that the resulting sketches maintain coherence with the input textual semantics while offering varying levels of abstraction.

Key Contributions and Methodology

Latent Diffusion Model Utilization: DiffSketcher capitalizes on pre-existing text-to-image diffusion models to generate sketches without requiring direct sketch datasets. It employs a differentiable rasterizer for optimizing curve parameters, effectively transferring image synthesis knowledge to the sketch generation domain.
Extended SDS Loss: Building on the SDS framework, the paper introduces an enhanced version that integrates with CLIP and LPIPS losses, enabling diverse and controlled vector sketch synthesis. This modification supports greater fidelity to textual prompts.
Attention-based Stroke Initialization: By exploiting attention maps within the diffusion model, the research presents a refined initialization strategy for stroke placement. This is critical for non-convex optimization landscapes, enhancing both convergence speed and final sketch quality.
Opacity and Stylistic Variability: The integration of opacity controls within the optimization process adds stylistic depth, mimicking human sketch styles by varying brushstroke weights, thus achieving more natural sketches.

Experimental Results and Implications

The experimental evaluations illustrate that DiffSketcher surpasses existing methods in generating high-quality and diverse sketches from textual descriptions. The comparisons with methods like CLIPasso reveal significant improvements in visual coherence and semantic alignment. These advancements underscore DiffSketcher's ability to translate textual abstractions into visually compelling sketches.

The implications of this research are manifold. On a practical level, it provides a tool for designers and artists to swiftly generate conceptual sketches from textual ideas, reducing manual effort and time. Theoretically, it demonstrates the potential of diffusion models in domains beyond traditional image synthesis, bridging the gap between natural language processing and computer graphics.

Future Directions

While the introduction of DiffSketcher marks a significant stride in text-to-sketch synthesis, the paper identifies several avenues for future research. Enhancing the capability to control the abstraction level directly through textual prompts could offer more personalized sketch generation. Additionally, extending the model's capacity to incorporate stylistic variations and multi-object scenes could further expand its applicability. Investigating integration with advanced neural architectures or alternative diffusion frameworks might yield further improvements in efficiency and quality.

In sum, DiffSketcher represents a promising convergence of language understanding and visual synthesis, presenting a robust method for automatic sketch generation that can be a foundational tool in creative and design-oriented AI applications.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Ximing Xing (8 papers)
Chuang Wang (36 papers)
Haitao Zhou (11 papers)
Jing Zhang (730 papers)
Qian Yu (116 papers)
Dong Xu (167 papers)

Citations (30)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

YouTube

Show All Videos