Closed-form characterization of the velocity field in LPIPS-regularized flow-matching decoders

Derive a closed-form characterization of the non-straight, training-dependent velocity field learned by pixel-space diffusion decoders trained with flow matching augmented by LPIPS loss (i.e., optimizing L_λ(\hat{\nu}|\nu)=\|\hat{\nu}-\nu\|^2+\lambda L(x_0,\hat{x}_0)), where the effective target field is \nu - (\lambda t/2) \nabla L and depends on the noise level t and the LPIPS gradient. The goal is to obtain an explicit analytic expression for this velocity field that explains its shift during training and enables principled sampling schedule design.

Background

The paper analyzes the training dynamics of diffusion decoders that combine flow matching with an LPIPS perceptual loss. By differentiating the combined objective, the authors show that optimizing the mixed loss is equivalent to optimizing a modified flow-matching target, effectively shifting the velocity field by a term proportional to the LPIPS gradient and the noise level t.

This modification implies the learned velocity field is non-straight and varies with training and t, which affects sampling behavior and the perception–distortion trade-off. However, the authors note they lack a closed-form expression for this field, leaving its precise form and properties analytically uncharacterized.

References

This analysis reveals that LPIPS-regularized Flow-Matching decoders learn a non-straight velocity field that shifts during their training, and for which we do not have a closed form.

— SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization (2510.04961 - Vallaeys et al., 6 Oct 2025) in Appendix: Sampling from LPIPS-regularized models (Section app_sub:lpips_theory)

Closed-form characterization of the velocity field in LPIPS-regularized flow-matching decoders

Background

References

Related Problems