Papers
Topics
Authors
Recent
2000 character limit reached

Variance-Stabilized Velocity Matching

Updated 1 December 2025
  • The paper introduces a variance-stabilized velocity matching objective that analytically normalizes target magnitudes to mitigate gradient instabilities.
  • It employs a conditional normalization factor based on endpoint statistics to balance training signals across all timesteps.
  • Empirical evaluations demonstrate smoother convergence and improved metrics like SSIM and PSNR in large-scale Bridge Models.

A variance-stabilized velocity-matching objective is a loss formulation used in training conditional generative models—particularly large-scale Bridge Models such as the Vision Bridge Transformer (ViBT)—that addresses severe gradient instabilities present in standard velocity-matching approaches. By analytically normalizing the velocity targets according to their conditional variance, this objective ensures well-conditioned gradients and balanced training signal across the entire time interval, which is critical for robust and scalable learning in data-to-data translation and instruction-based image/video editing tasks (Tan et al., 28 Nov 2025).

1. Foundation: Brownian Bridge Models and Velocity Matching

Conditional Bridge Models model a stochastic process XtX_t defined on t[0,1]t \in [0,1] via an SDE:

dXt=v(Xt,t)dt+σ(t)dWt,dX_t = v(X_t, t)dt + \sigma(t)dW_t,

with initial (X0p0X_0\sim p_0) and terminal (X1p1X_1\sim p_1) endpoint constraints. In velocity-matching, a neural parameterization vθv_\theta is trained to approximate a “teacher” instantaneous velocity:

ut(Xtx0,x1)=txt(teacher)u_t(X_t | x_0, x_1) = \partial_t x_t^{(\text{teacher})}

For the Brownian bridge case (σ(t)1\sigma(t)\equiv1), synthesis proceeds via:

Xt=(1t)x0+tx1+t(1t)ϵ,ϵN(0,I)X_t = (1-t)x_0 + t x_1 + \sqrt{t(1-t)}\,\epsilon, \quad \epsilon \sim \mathcal{N}(0,I)

The canonical target velocity is:

ut(Xtx0,x1)=x1Xt1tu_t(X_t|x_0,x_1) = \frac{x_1 - X_t}{1-t}

The naive velocity-matching loss,

Lvm(θ)=E[vθ(Xt,t)ut(Xtx0,x1)2],\mathcal{L}_{vm}(\theta) = \mathbb{E}\Big[\|v_\theta(X_t, t) - u_t(X_t|x_0, x_1)\|^2\Big],

anchors the learning process.

2. Instabilities in Standard Objectives

The unnormalized velocity target diverges as t1t \to 1:

ut(Xtx0,x1)O((1t)1),u_t(X_t|x_0,x_1) \sim \mathcal{O}((1-t)^{-1}),

producing gradient explosions at late timesteps. This results in numeric instability, dominating loss contributions near t=1t=1, and severely undertraining the model elsewhere. Displacement-based alternatives,

dt(Xtx0,x1)=x1Xt,d_t(X_t|x_0,x_1) = x_1 - X_t,

suffer from vanishing targets as t1t \to 1, over-weighting early timesteps and yielding the converse imbalance (Tan et al., 28 Nov 2025).

3. Derivation of the Variance-Stabilized Velocity-Matching Objective

To correct these pathologies, the stabilized objective normalizes utu_t by its conditional root-mean-square magnitude:

  • Compute the conditional second moment:

Eϵ[ut2]=x1x02+t1tD,\mathbb{E}_\epsilon[\|u_t\|^2] = \|x_1 - x_0\|^2 + \frac{t}{1-t}D,

where DD is data dimensionality.

  • Define the time- and endpoint-dependent normalization factor:

α(x0,x1,t)2=1+tD(1t)x1x02\alpha(x_0,x_1,t)^2 = 1 + \frac{t D}{(1-t)\|x_1-x_0\|^2}

  • The stabilized velocity target:

u~t(Xtx0,x1)=ut(Xtx0,x1)/α(x0,x1,t)\tilde{u}_t(X_t|x_0,x_1) = u_t(X_t|x_0,x_1) / \alpha(x_0,x_1,t)

  • The normalized model output:

v~θ(Xt,t)=vθ(Xt,t)/α(x0,x1,t)\tilde{v}_\theta(X_t,t) = v_\theta(X_t,t) / \alpha(x_0,x_1,t)

  • The variance-stabilized loss:

Lstab(θ)=Ex0,x1,t,ϵ[v~θ(xt,t)u~t(xtx0,x1)2]\mathcal{L}_{\mathrm{stab}}(\theta) = \mathbb{E}_{x_0,x_1,\,t,\,\epsilon} \Bigl[ \bigl\|\tilde{v}_\theta(x_t,t)-\tilde{u}_t(x_t|x_0,x_1)\bigr\|^2 \Bigr]

In all cases, the model continues to predict unnormalized velocities; normalization is applied only inside the loss function for training (Tan et al., 28 Nov 2025).

4. Analysis: Advantages of Variance Stabilization

Variance normalization offers multiple concrete benefits:

  • Uniform gradient magnitudes: Gradient norms are stabilized across tt, precluding numerical blowup for t1t \to 1.
  • Equalized training signal: The expected squared norm of the target, S(t)=E[u~t2]S(t)=\mathbb{E}[\|\tilde{u}_t\|^2], is nearly flat in tt, ensuring balanced coverage of early, mid, and late timesteps.
  • Scalability: In deep Transformer models, the elimination of large-magnitude targets prevents excessive gradient clipping and optimizer pathologies, supporting robust training at the 1.3–20B parameter scale (Tan et al., 28 Nov 2025).

5. Implementation Details

The stabilized velocity objective is implemented with the following procedure:

  1. Sample endpoint pairs (x0,x1)(x_0,x_1).
  2. Draw tUniform[0,1]t\sim\mathrm{Uniform}[0,1] and ϵN(0,I)\epsilon\sim\mathcal{N}(0,I).
  3. Generate xt=(1t)x0+tx1+t(1t)ϵx_t = (1-t)x_0 + t x_1 + \sqrt{t(1-t)}\,\epsilon.
  4. Compute the raw velocity, ut=(x1xt)/(1t)u_t = (x_1 - x_t)/(1-t).
  5. Compute α2=1+(tD)/[(1t)x1x02]\alpha^2 = 1 + (t D)/[(1-t)\|x_1-x_0\|^2].
  6. Compute v~θ=vθ(xt,t)/α\tilde{v}_\theta = v_\theta(x_t,t)/\alpha, u~t=ut/α\tilde{u}_t = u_t/\alpha.
  7. Formulate the MSE loss, v~θu~t2\|\tilde{v}_\theta - \tilde{u}_t\|^2.
  8. Update θ\theta using the batch mean of the above loss.

At inference, Euler–Maruyama integration uses a variance-corrected noise:

xk+1=xk+Δtkvθ(xk,tk)+Δtk1tk+11tkϵkx_{k+1} = x_k + \Delta t_k v_\theta(x_k,t_k) + \sqrt{\Delta t_k \cdot \frac{1-t_{k+1}}{1-t_k}}\,\epsilon_k

(Tan et al., 28 Nov 2025).

6. Empirical Evaluation and Ablations

Empirical results demonstrate clear superiority over unnormalized alternatives:

  • Metrics: The stabilized objective outperforms displacement and raw velocity objectives on SSIM, PSNR, NIQE, CLIP Score, and VBench for image editing and depth-to-video tasks (Table 7).
  • Smoother convergence: Training curves exhibit markedly reduced loss variance and improved stability (Fig. 7a).
  • Balanced loss contributions: Loss profile S(t)S(t) is flat across tt for the stabilized objective, preventing endpoint dominance (Fig. 2).
  • Objective ablation: Stabilized velocity achieves the highest average image-edit score ($3.55$) and VBench score ($0.709$), compared with displacement ($3.50$, $0.695$) and raw velocity ($3.36$, $0.698$) objectives.
  • Robustness to global noise-scale ss is observed when combining stabilization with noise-scale tuning (Table 8) (Tan et al., 28 Nov 2025).
Objective Pathology Loss dominance
Raw velocity Exploding as t1t \to 1 Endpoints, late tt
Displacement Vanishing as t1t \to 1 Early tt
Variance-stabilized velocity None (by construction) Uniform in tt
Denoising score matching Vanishing/exploding at ends Time-dependent, needs reweighting

Variance stabilization directly addresses the pathologies of both displacement and raw velocity approaches. Score-matching objectives in diffusion (notably DSM) similarly suffer from time-dependent magnitude issues unless properly reweighted, which the stabilized velocity-matching loss resolves analytically (Tan et al., 28 Nov 2025).

8. Broader Context and Connections

Variance stabilization in velocity-matching addresses a subclass of variance pathologies that also arise in broader score-matching and minimum-velocity learning settings (e.g., DSM, CD-1, Wasserstein minimum velocity) (Wang et al., 2020). The analytic normalization parallels the control variate strategies used in Wasserstein minimum velocity estimation, underscoring the general importance of variance control for stable and scalable training in generative models. A plausible implication is that analogous normalization can benefit related objectives where target magnitudes are analytically tractable and amenable to similar stabilizing transformations.

The variance-stabilized velocity-matching objective is essential for scaling conditional generative models with trajectory-based training to billion-parameter regimes, serving as a principled correction to gradient imbalances endemic in prior, unnormalized velocity-matching and displacement-matching approaches (Tan et al., 28 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Variance-Stabilized Velocity-Matching Objective.