FORT: Forward-Only Regression Training of Normalizing Flows (2506.01158v1)

Published 1 Jun 2025 in cs.LG, cs.AI, and stat.ML

Abstract: Simulation-free training frameworks have been at the forefront of the generative modelling revolution in continuous spaces, leading to neural dynamical systems that encompass modern large-scale diffusion and flow matching models. Despite the scalability of training, the generation of high-quality samples and their corresponding likelihood under the model requires expensive numerical simulation -- inhibiting adoption in numerous scientific applications such as equilibrium sampling of molecular systems. In this paper, we revisit classical normalizing flows as one-step generative models with exact likelihoods and propose a novel, scalable training objective that does not require computing the expensive change of variable formula used in conventional maximum likelihood training. We propose Forward-Only Regression Training (FORT), a simple $\ell_2$-regression objective that maps prior samples under our flow to specifically chosen targets. We demonstrate that FORT supports a wide class of targets, such as optimal transport targets and targets from pre-trained continuous-time normalizing flows (CNF). We further demonstrate that by using CNF targets, our one-step flows allow for larger-scale training that exceeds the performance and stability of maximum likelihood training, while unlocking a broader class of architectures that were previously challenging to train. Empirically, we elucidate that our trained flows can perform equilibrium conformation sampling in Cartesian coordinates of alanine dipeptide, alanine tripeptide, and alanine tetrapeptide.

Summary

The paper introduces FORT, a novel framework using an a_2 -regression objective for training normalizing flows, eliminating the need for expensive inverse Jacobian computations required by traditional maximum likelihood estimation.
FORT trains flows by regressing to target maps, such as Optimal Transport or Reflow from Continuous Normalizing Flows, which enables more stable and efficient forward-only optimization.
Experiments on molecular systems demonstrate that FORT-trained models achieve improved metrics and higher sample quality while retaining exact likelihoods, making classical normalizing flows more viable for scientific applications.

Forward-Only Regression Training of Normalizing Flows

The paper presents a novel approach called Forward-Only Regression Training (FORT) for training normalizing flows, which serves as a framework to enhance the scalability and performance of generating models while enabling exact likelihood computation. The work revisits classical normalizing flows, proposing an $\ell_2$ -regression-based objective that eliminates the need to compute the expensive change of variable formula typically required in maximum likelihood estimation (MLE). This progression offers not only theoretical insights but practical improvements in the generative modeling landscape, particularly in scientific applications such as molecular equilibrium sampling.

Background and Motivation

Generative models have garnered significant attention due to their ability to simulate complex distributions in various domains. In scientific applications, precise likelihood computation and efficient sample generation are paramount, especially for domains requiring high-fidelity samples like molecular biology. Traditional normalizing flows are notable for their invertibility and exact likelihoods but struggle with scalability due to the cumbersome computation of Jacobian determinants in inverse calculations. The impetus behind FORT is to surmount these computational challenges by introducing a scalable regression-based training paradigm.

Forward-Only Regression Training (FORT)

FORT diverges from MLE by assuming access to an invertible map $f^\star$ , allowing regression to target known sample-correspondence pairs $(x_0, x_1)$ under $f^\star$ . This empowers the training process to leverage sample pairs directly, simplifying optimization by independently addressing the forward mapping problem without synchronously estimating the inverse Jacobian determinants.

The FORT objective function is articulated as: $\mathcal{L}(\theta) = \mathbb{E}_{x_0, x_1} \left[ \| f_\theta(x_0) - x_1 \|^2 \right] + \lambda_r$ where $\lambda_r$ accounts for regularization to mitigate potential numerical instabilities. The model thus learns via forward-only passes, optimizing the regression to the selected map $f^\star$ —a strategy found significantly more stable and efficient.

Instantiation and Application

To instantiate FORT, two principal target classes are proposed:

Optimal Transport (OT) Targets: Utilizes pre-computed OT maps serving as invertible functions, facilitating training without additional computational overhead, though OT computation may be intensive depending on sample volume.
Reflow Targets: Leverages large pretrained Continuous Normalizing Flows (CNFs) to create target maps by tracing the integrated vector field solution, which serves as discrete invertible mappings for training NFs.

Experimental Validation

Substantial experiments were conducted across molecular systems like alanine dipeptide, tripeptide, and tetrapeptide. Compared to MLE training, the FORT approach significantly improved metrics such as Wasserstein distances concerning energy and dihedral distributions. Normalizing Flow architectures trained using FORT outperformed—or at least matched—in terms of effective sample sizes and sample quality without suffering from the typical pitfalls in mode coverage evident in MLE-trained models.

The paper's results indicate that FORT-trained models can not only generate high-fidelity samples but also retain an exact and computationally feasible way to evaluate the likelihoods of these samples. Practically, such advancements fortify classical normalizing flows as viable contenders in applications necessitating exact likelihoods, such as equilibrium sampling.

Theoretical and Practical Implications

FORT advances the theoretical framework of regression-based model training in generative tasks, offering a fresh perspective and a robust alternative to conventional MLE training. Practically, the implications span improved model training stability, scalability in larger datasets, and applicability in scientific computations that demand accuracy and efficiency.

Future Directions

Looking forward, the exploration of FORT may inspire developments in hybrid training frameworks that combine insights from forward-only regression and classical learning approaches. Further research into automating the selection of target mappings or enhancing the application of OT methods in broader contexts might provide additional robustness and flexibility. Additionally, refinement in architectural design exploiting FORT could enhance the computational footprint and generalize to other domains within AI and machine learning.

Tweets

https://twitter.com/danyalrehman17/status/1930624294003904779

https://twitter.com/fly51fly/status/1931834054120489290