- The paper demonstrates that using a time-dependent weighting function in the loss is mathematically equivalent to reweighting the time distribution under mild regularity conditions.
- It extends the Generator Matching framework to incorporate Bregman divergence-based losses that depend on both state and time, ensuring consistency with Markov process dynamics.
- The findings justify empirical practices in generative modeling, paving the way for advanced applications in flow matching, diffusion, edit flows, and jump models.
Time-Dependent Loss Reweighting: Theoretical Foundations for Flow Matching and Diffusion Models
Overview
This work rigorously formalizes the theoretical underpinnings of time-dependent loss reweighting in the training of generative models based on flow matching and diffusion frameworks. It demonstrates that loss functions in generator matching can depend explicitly on both time and state, and that weighting the loss by an arbitrary time-dependent factor is theoretically justified as long as technical regularity conditions are satisfied. The characterizations apply to a broad class of flow, diffusion, and edit-flow models over continuous, manifold, and discrete spaces.
Generator Matching Losses with Time and State Dependency
The paper extends the Generator Matching (GM) framework to encompass linear parameterizations and Bregman divergence-based losses that explicitly depend on both the current state Xt​ and time t. This generalization captures the practical reality that modelers often empirically adjust time-dependent scaling in loss functions to stabilize or accelerate training, and it shows that such modifications are theoretically innocuous as long as the loss terms remain positive almost everywhere with respect to Lebesgue measure.
Given a conditional Markov process parameterized by a latent z and an infinitesimal generator Ltz​, the GM objective can be written as:
Lcgm​(θ)=Et∼D,Z,Xt​∼pt∣Z​​[w(t)Dt,Xt​​(FtZ​(Xt​),Ftθ​(Xt​))]
where w(t) is a nonnegative weighting function, Dt,Xt​​ is a Bregman divergence that may vary with t and Xt​, and D is a time distribution dominating Lebesgue measure.
Importantly, the regularity conditions ensure:
- The time and state-varying parameterizations and divergences are well-defined.
- The expectation over the weighted time distribution preserves the Markov process dynamics.
- Both marginal and conditional generator matching can be addressed using this general formulation.
Theoretical Justification for Loss Reweighting
A central result is that time-dependent reweighting, realized as multiplying the loss by w(t), is mathematically equivalent to taking expectations under an alternative time distribution. The paper proves that for any absolutely continuous time distribution D and a nonnegative weighting function w(t) positive almost everywhere, the reweighted loss
L(θ)=Et∼D,Xt​​[w(t)Dt,Xt​​(⋅)]
is equivalent (up to a scaling constant) to optimizing the same generator with the reweighted time distribution.
This theoretical statement resolves longstanding empirical practices of discarding or adjusting prescribed time-dependent scaling, a technique used widely in diffusion and flow matching works (e.g., Denoising Diffusion Probabilistic Models (Ho et al., 2020), Score-Based Generative Modeling (Song et al., 2020), Flow Matching Guide (Lipman et al., 9 Dec 2024)).
If the GM loss achieves zero, the model parametrized generator solves the Kolmogorov Forward Equation for pt​. As a consequence, the Markov process defined by the trained generator yields samples from the target data distribution at t=1, regardless of the time-dependent scaling used during training.
Extension to Edit Flows, Endpoint Prediction, and Jump Models
The theoretical framework is extended to cover additional variants:
- Edit Flows: Analogous results apply to models parameterized by edit operations rather than continuous drift, showing that time-dependent reweighting can be transferred through the same logic to marginal generators.
- X1​-Prediction Schemes: It is proven that flow matching objectives predicting endpoints (often preferred for practical or computational reasons) are also covered under this reweighting theory. Explicitly, both affine endpoint prediction and trajectory-based losses with time-dependent scaling are theoretically valid.
- Discrete and Jump Models: For models with atomic jumps and time-dependent hazards, the conditional generator matching loss is shown to be theoretically justified without explicit inclusion of hazard terms in the loss.
Advanced Treatment of Bregman Divergences
The paper rigorously generalizes Bregman divergence theory to allow losses whose associated convex functions are differentiable only on the relative interior of the domain (as in binary cross entropy or Poisson losses). This allows the use of more expressive loss functions in discrete and manifold spaces, covering a broad range of practical modeling choices.
Practical and Theoretical Implications
Practical Ramifications
- The theoretical findings justify a wide range of empirical practices in efficient generative modeling, such as non-uniform time sampling, singularity avoidance via endpoint loss scaling (e.g., smoothing near t=1 in flow matching), and flexible loss function design.
- The results guarantee that reweighting does not alter the learned mapping in the infinite data and compute limit, provided regularity is maintained.
Theoretical Consequences
- The equivalence demonstrated between various loss reweighting schemes and expectations under different time distributions provides unifying mathematical clarity for future extensions of flow and diffusion models.
- The explicit accommodation of time and state dependency in both linear parameterizations and Bregman divergences enables generalizations to more complex stochastic processes, such as those on manifolds or with non-Euclidean structure.
Future Directions
- Further exploration could target high-dimensional or singular state spaces, such as submanifold diffusion or event-driven graphical models (Lu et al., 2023).
- Adaptive or learned time-weighting distributions may become a focus for balancing sample efficiency and training convergence.
- Extensions toward reinforcement learning or energy-based modeling frameworks could leverage the generalized loss reweighting principle for broader classes of stochastic processes.
Conclusion
This paper establishes robust and comprehensive theoretical justification for time-dependent loss reweighting in flow matching and diffusion-based generative models, subsuming a wide variety of practical modeling schemes within the generator matching framework. By showing that explicit time and state dependence in both linear parameterization and Bregman divergence-based losses maintains validity under weak regularity conditions, the work unifies diverse approaches in the area and enables principled further development of generative models over arbitrary state spaces.
Reference: "Time dependent loss reweighting for flow matching and diffusion models is theoretically justified" (2511.16599)