Variance-Stabilized Velocity Matching
- The paper introduces a variance-stabilized velocity matching objective that analytically normalizes target magnitudes to mitigate gradient instabilities.
- It employs a conditional normalization factor based on endpoint statistics to balance training signals across all timesteps.
- Empirical evaluations demonstrate smoother convergence and improved metrics like SSIM and PSNR in large-scale Bridge Models.
A variance-stabilized velocity-matching objective is a loss formulation used in training conditional generative models—particularly large-scale Bridge Models such as the Vision Bridge Transformer (ViBT)—that addresses severe gradient instabilities present in standard velocity-matching approaches. By analytically normalizing the velocity targets according to their conditional variance, this objective ensures well-conditioned gradients and balanced training signal across the entire time interval, which is critical for robust and scalable learning in data-to-data translation and instruction-based image/video editing tasks (Tan et al., 28 Nov 2025).
1. Foundation: Brownian Bridge Models and Velocity Matching
Conditional Bridge Models model a stochastic process defined on via an SDE:
with initial () and terminal () endpoint constraints. In velocity-matching, a neural parameterization is trained to approximate a “teacher” instantaneous velocity:
For the Brownian bridge case (), synthesis proceeds via:
The canonical target velocity is:
The naive velocity-matching loss,
anchors the learning process.
2. Instabilities in Standard Objectives
The unnormalized velocity target diverges as :
producing gradient explosions at late timesteps. This results in numeric instability, dominating loss contributions near , and severely undertraining the model elsewhere. Displacement-based alternatives,
suffer from vanishing targets as , over-weighting early timesteps and yielding the converse imbalance (Tan et al., 28 Nov 2025).
3. Derivation of the Variance-Stabilized Velocity-Matching Objective
To correct these pathologies, the stabilized objective normalizes by its conditional root-mean-square magnitude:
- Compute the conditional second moment:
where is data dimensionality.
- Define the time- and endpoint-dependent normalization factor:
- The stabilized velocity target:
- The normalized model output:
- The variance-stabilized loss:
In all cases, the model continues to predict unnormalized velocities; normalization is applied only inside the loss function for training (Tan et al., 28 Nov 2025).
4. Analysis: Advantages of Variance Stabilization
Variance normalization offers multiple concrete benefits:
- Uniform gradient magnitudes: Gradient norms are stabilized across , precluding numerical blowup for .
- Equalized training signal: The expected squared norm of the target, , is nearly flat in , ensuring balanced coverage of early, mid, and late timesteps.
- Scalability: In deep Transformer models, the elimination of large-magnitude targets prevents excessive gradient clipping and optimizer pathologies, supporting robust training at the 1.3–20B parameter scale (Tan et al., 28 Nov 2025).
5. Implementation Details
The stabilized velocity objective is implemented with the following procedure:
- Sample endpoint pairs .
- Draw and .
- Generate .
- Compute the raw velocity, .
- Compute .
- Compute , .
- Formulate the MSE loss, .
- Update using the batch mean of the above loss.
At inference, Euler–Maruyama integration uses a variance-corrected noise:
6. Empirical Evaluation and Ablations
Empirical results demonstrate clear superiority over unnormalized alternatives:
- Metrics: The stabilized objective outperforms displacement and raw velocity objectives on SSIM, PSNR, NIQE, CLIP Score, and VBench for image editing and depth-to-video tasks (Table 7).
- Smoother convergence: Training curves exhibit markedly reduced loss variance and improved stability (Fig. 7a).
- Balanced loss contributions: Loss profile is flat across for the stabilized objective, preventing endpoint dominance (Fig. 2).
- Objective ablation: Stabilized velocity achieves the highest average image-edit score ($3.55$) and VBench score ($0.709$), compared with displacement ($3.50$, $0.695$) and raw velocity ($3.36$, $0.698$) objectives.
- Robustness to global noise-scale is observed when combining stabilization with noise-scale tuning (Table 8) (Tan et al., 28 Nov 2025).
7. Comparison to Related Objectives
| Objective | Pathology | Loss dominance |
|---|---|---|
| Raw velocity | Exploding as | Endpoints, late |
| Displacement | Vanishing as | Early |
| Variance-stabilized velocity | None (by construction) | Uniform in |
| Denoising score matching | Vanishing/exploding at ends | Time-dependent, needs reweighting |
Variance stabilization directly addresses the pathologies of both displacement and raw velocity approaches. Score-matching objectives in diffusion (notably DSM) similarly suffer from time-dependent magnitude issues unless properly reweighted, which the stabilized velocity-matching loss resolves analytically (Tan et al., 28 Nov 2025).
8. Broader Context and Connections
Variance stabilization in velocity-matching addresses a subclass of variance pathologies that also arise in broader score-matching and minimum-velocity learning settings (e.g., DSM, CD-1, Wasserstein minimum velocity) (Wang et al., 2020). The analytic normalization parallels the control variate strategies used in Wasserstein minimum velocity estimation, underscoring the general importance of variance control for stable and scalable training in generative models. A plausible implication is that analogous normalization can benefit related objectives where target magnitudes are analytically tractable and amenable to similar stabilizing transformations.
The variance-stabilized velocity-matching objective is essential for scaling conditional generative models with trajectory-based training to billion-parameter regimes, serving as a principled correction to gradient imbalances endemic in prior, unnormalized velocity-matching and displacement-matching approaches (Tan et al., 28 Nov 2025).