Differentiable Vector Graphics Pipeline

Updated 11 November 2025

Differentiable vector graphics pipelines are neural frameworks that make vector representations fully differentiable through analytic formulations and automatic differentiation.
They integrate rasterization, anti-aliasing, and alpha compositing with GPU acceleration to jointly optimize geometric and appearance parameters.
These pipelines enable diverse applications such as image synthesis, text-driven editing, and robot drawing by converting vector primitives into raster images.

Differentiable Vector Graphics (DiffVG) pipelines comprise a class of neural optimization frameworks in which vector graphic representations (e.g., Bézier curves, B-splines, circles, lines) and their rendering into raster images are made fully differentiable, enabling the direct application of gradient-based optimization methods for image synthesis, abstraction, editing, and robot path planning. Central to this approach is a differentiable rasterizer that computes gradients of image-space loss functions with respect to the geometric and appearance parameters of vector primitives. DiffVG pipelines combine analytic formulations for geometry, anti-aliasing, and compositing with automatic differentiation and GPU acceleration, yielding tools that are adaptable to both low-level robot control and high-level text/image-guided abstraction.

1. Parameterization of Vector Primitives

In DiffVG pipelines, every vector-shape instance is parameterized according to its primitive type—most commonly cubic Béziers, B-splines, polylines, circles, and rectangles. For Bézier curves, the parameter set is:

$\theta = \bigl\{P_0,\,P_1,\,P_2,\,P_3,\,r,\,g,\,b,\,\alpha\bigr\}, \quad P_i \in \mathbb{R}^2$

where $(r, g, b)$ and $\alpha$ define color and opacity. Stroked curves include an additional width parameter. Control points and all per-primitive attributes are stored and manipulated as differentiable Tensors (e.g., via PyTorch), permitting joint optimization of geometry and appearance (Song et al., 2022).

To achieve more structured or physically plausible traces, alternative parameterizations are employed, such as:

Human motor parameterizations via the sigma-lognormal model for robot drawing (Berio et al., 3 Jul 2025)
Uniform (cardinal) B-splines, enabling long, high-order-continuity curves suitable for stylization and abstraction (Berio et al., 7 Nov 2025)

2. Differentiable Rasterization and Gradient Flow

The differentiable rasterizer is the cornerstone of the DiffVG pipeline. The rasterization process proceeds via:

2.1 Coverage Computation

Pixel coverage for each primitive is determined from its signed-distance function (SDF):

$d_i(x) = \mathrm{SDF}(x; \theta_i)$

Coverage is soft-assigned using a sigmoidal or Hermite "smoothstep" transfer function, e.g.:

$C_i(x) = \sigma\bigl(-d_i(x) / \varepsilon\bigr), \quad \sigma(u) = \frac{1}{1 + e^{-u}}$

This ensures the coverage is differentiable with respect to both control points and rendering parameters.

2.2 Alpha Compositing

The scene is composited front-to-back:

$I(x) = \sum_{i=1}^N \left(\alpha_i C_i(x)\right) \mathbf{c}_i \prod_{j<i}\left[1 - \alpha_j C_j(x)\right]$

This blending is entirely differentiable, and the rasterizer implements the analytic gradient

$\frac{\partial I(x)}{\partial \theta_i}$

via chain-rule differentiation, making both geometry and color parameters accessible to optimization through standard autodiff systems (Song et al., 2022).

3. Curve and Motion Parameterization Strategies

3.1 Sigma-Lognormal Model for Human-like Robot Strokes

To match the physicality of skilled human motion, trajectories can be parameterized by the sigma-lognormal model. A drawing motion is planned as $m$ submovements defined by targets $\{\mathbf{p}_i\}$ and angles $\{\delta_i\}$ , with each stroke's velocity following a lognormal:

$\Lambda_i(t) = \frac{1}{\sigma_i\sqrt{2\pi}\,\varphi(t-t_{0i})} \exp\left(-\frac{[\ln\varphi(t-t_{0i}) - \mu_i]^2}{2\sigma_i^2}\right)$

with ReLU+ $\epsilon$ safety $\varphi(u) = \max(u, \epsilon) + \epsilon$ . The cumulative fraction $w_i(t)$ and spatial transformations determine instantaneous position. A total trajectory is built as:

$\mathbf{x}(t) = \mathbf{p}_0 + \sum_{i=1}^m \mathbf{x}_i(t)$

and reparameterized into differentiable duration, offset, and shape/skewness variables:

$\{T_i,\,\Delta t_i,\,A_{c,i}\}$

yielding intuitive and physically informed optimization variables (Berio et al., 3 Jul 2025).

3.2 Smoothing B-Splines and Linear Bézier Conversion

Replacing or augmenting Béziers, uniform B-splines of degree $p$ are defined over control points $\{c_i\}$ with evaluation

$x(u) = \sum_{i=0}^{n-1} c_i\,N_{i,k}(u)$

and converted to piecewise cubic Béziers through exact linear mapping. For each B-spline, a block-Toeplitz matrix performs the B-spline-to-Bézier mapping, preserving differentiability for subsequent rasterization and optimization (Berio et al., 7 Nov 2025).

4. Loss Functions and Optimization Objectives

DiffVG pipelines admit generic composable loss functions. Principal forms include:

Image Matching Loss (e.g., MSE across multi-scale blur $S$ ):

$\mathcal{L}_{\mathbb{I}} = \sum_{s \in S} \mathrm{MSE}(\mathbb{I}_s, \hat{\mathbb{I}}_s)$

CLIP-guided Text or Style Loss:

$\mathcal{L}_{\mathbb{I}} = -\langle f_{\mathrm{CLIP}}(\mathbb{I}), f_{\mathrm{CLIP}}(\text{caption}) \rangle$

Directional CLIP Loss for text-driven manipulations:

$L_{\text{dir}} = 1 - \frac{\Delta I \cdot \Delta T}{\|\Delta I\|\|\Delta T\|}$

with $\Delta T$ and $\Delta I$ the shifts in CLIP-encoded text/image features (Song et al., 2022).

Smoothing Penalties:
- For robot motion, minimum-time and isochrony:
$\mathcal{L}_{\mathrm{smooth}} = T_{\mathrm{end}} + w_\sigma\,\mathrm{Var}(\{\Delta t_i\}),\quad w_\sigma=50$ - For B-splines, $L_2$ -norm of the $d$ th-order derivative:

$L_{\mathrm{smooth}}^{(d)} = \frac{1}{T} \int \|x^{(d)}(u)\|^2 du$

with Gram-matrix closed-form.
Repulsion/Bounding Box/Palette Regularization: Geometric or color constraints, including Gumbel-softmax palette assignment for limited-color abstraction (Berio et al., 7 Nov 2025).

Losses are composed according to the application, and balanced via scalar hyperparameters.

5. End-to-End Optimization and Pipeline Execution

The full optimization loop follows the pattern:

Initialization: Vector primitives or stroke motor plans are seeded, e.g., by polyline simplification, TSP on saliency for robot drawing, or multi-round vectorization for image abstraction.
Parameter Encoding: Physical/human parameters are reparameterized (e.g., $\{\mu_i, \sigma_i, t_{0i}\}$ via durations/offsets/skewness) for the sigma-lognormal model (Berio et al., 3 Jul 2025); splines are padded and mapped to Béziers as needed.
Forward Pass: Trajectories or curve geometries are sampled, converted to rasterizable segments, and DiffVG's differentiable rasterizer computes the rendered image.
Loss Evaluation: Application-specific loss functions are computed, e.g., image fit, text-guidance, smoothing penalties.
Backward Pass: Automatic differentiation propagates gradients from scalar loss to all primitive parameters (geometry, color, duration, skewness, etc.).
Optimizer Update: Adam (or similar) is used to update parameters, often with separate step sizes for shape and color where applicable (Song et al., 2022).
Post-Processing: For robot applications, optimized trajectories are mapped to metric coordinates, resampled and time-parameterized under maximum velocity and acceleration constraints, and subjected to inverse kinematics (e.g., Gauss–Newton iLQR), yielding executable controls for robotic arms (Berio et al., 3 Jul 2025).

The following table summarizes key sub-pipeline mappings:

Step	Input Parameters	Output
SLM parameterization	$\{T_i, \Delta t_i, A_{c,i}, \delta_i, \mathbf{p}_i\}$	$\{\mu_i, \sigma_i, t_{0i}\}$
Trajectory sampling	$\{\mu_i, \sigma_i, t_{0i}\}$ , $t_k$	$\mathbf{x}(t_k), \dot{\mathbf{x}}(t_k)$
Bézier construction	$\mathbf{x}(t_k)$ , $\dot{\mathbf{x}}(t_k)$	Bézier control points
Rasterization	Bézier control points, colors, widths	Rendered image $\mathbb{I}(\theta)$
Loss & backward	$\mathbb{I}(\theta)$ , reference(s), weights	$\partial \mathcal{L}/\partial\theta$

6. Demonstration Applications

Image-driven Robot Drawing

Robot trajectories are optimized to produce physically feasible, human-like rapid drawing movements, matching either reference images or text prompts via CLIP and image-space loss functions. The system generates feasible end-effector trajectories and control commands for 7-DoF robotic arms, demonstrated with tasks including synthetic graffiti generation and image abstraction (Berio et al., 3 Jul 2025).

Neural Image Abstraction with Smooth B-Splines

By using very long smoothing B-splines integrated into the DiffVG framework, pipelines enable stylized space-filling paths, stroke-based or closed-area abstraction, and stylized text (calligrams). Fidelity/smoothness trade-offs are controlled via analytic $L_2$ -norm smoothing, and compositional stylization uses multiscale MSE, CLIP-based guidance, SDS/ISM with diffusion models, and palette quantization (Berio et al., 7 Nov 2025).

Text- and Region-Guided Editing

In CLIPVG, vectorized representations initialized via multi-scale tracing are edited according to text prompts, region-of-interest guidance, and random-crop augmentation, all via differentiable CLIP-guided objectives. Decoupled shape and color optimization permit pure recoloring, geometric manipulation, or combined editing within a unified framework (Song et al., 2022).

7. Pipeline Extensibility and Compatibility

All pipeline stages are implemented such that new curve parameterizations, loss terms, and rasterization modes are fully differentiable and compatible with the DiffVG scene graph. Feature additions such as B-spline–to–Bézier conversion and per-point variable stroke widths are transparent to downstream rasterization and optimization routines. The pipelines admit diverse applications in image abstraction, stylization, physical robot drawing, and text-guided vector editing. All analytic and loss formulations are constructed to enable automatic differentiation from raster or abstract objectives to vector graphics parameters, facilitating wide applicability in both computer vision and robotics domains (Berio et al., 3 Jul 2025, Berio et al., 7 Nov 2025, Song et al., 2022).