Forward-Only Regression Training (FORT)

Updated 30 June 2025

Forward-Only Regression Training (FORT) is a family of algorithms that trains regression models using only forward passes and localized regression objectives.
It reduces computational complexity by replacing backpropagation with a direct ℓ2-regression objective, making high-dimensional model training more scalable.
The method enhances training stability and supports one-step sample generation with exact likelihoods, crucial for advanced scientific and generative modeling applications.

Forward-Only Regression Training (FORT) encompasses a family of algorithms and methodologies that enable the efficient training of regression models by exclusively using forward passes, omitting the need for backward passes, and often leveraging layerwise or local objectives. These procedures have become central to scalable, robust, and interpretable regression and generative modeling, particularly in scenarios where backpropagation is infeasible, costly, or biologically implausible.

1. Theoretical and Algorithmic Principles

FORT substitutes traditional global or backward-driven optimization with regression-based or local approaches that only require information from the current forward computation. In the context of classical normalizing flows, FORT operates by minimizing a direct $\ell_2$ -regression objective between the output of the model—applied to a prior sample $x_0$ —and a specifically chosen target $x_1$ , formally: $\mathcal{L}(\theta) = \mathbb{E}_{x_0, x_1} \left[ \|f_\theta(x_0) - x_1\|^2 \right] + \lambda_r \mathcal{R}$ where $f_\theta$ is the invertible neural network, and $\mathcal{R}$ is a regularization term (typically involving the log-determinant to ensure invertibility).

Unlike maximum likelihood estimation (MLE) that relies on the change-of-variable formula and its log-determinant Jacobian—a process computationally and numerically intensive in expressive or high-dimensional architectures—FORT decouples the learning dynamics from these simulation and inversion costs. The model is trained to match the input-output target coupling but does not require explicit density evaluation or backward-in-time computation during learning.

2. Target Selection Strategies

The practical effectiveness and generality of FORT are strongly dependent on the choice of training targets, $x_1$ , for each latent input $x_0$ :

Optimal Transport (OT) Targets: The optimal transport map $T^*$ provides a pairing between prior samples and data samples via minimizing a global cost. These maps are invertible and can be computed offline for moderate dataset sizes but are cubic in computational cost with respect to the number of samples.
Pre-trained Continuous-Time Normalizing Flow (CNF) Targets: Using pre-trained flow models, the solution paths generated by integrating neural ODE flows serve as target couplings. This approach, referred to as reflow, allows massive sample generation and is especially practical for large-scale or high-dimensional problems.

The targets must define a pushforward (or coupling) between the model’s prior and the data distribution, but need not provide explicit density values. This flexibility extends FORT beyond standard MLE and makes it compatible with architectures otherwise ill-suited to standard flow training.

3. Methodological and Practical Advantages

Compared to MLE-based normalizing flow training, FORT offers several distinct advantages:

Forward-Only Computation: No need to compute determinants, inverses, or perform any backward passes; only forward propagation is required throughout training.
Scalability: The computational and memory demands are dramatically reduced, enabling training of highly expressive invertible architectures (including those with residual connections, neural spline flows, or transformer-based flows) that may be intractable under MLE due to unstable or slow Jacobian computations.
Training Stability and Flexibility: By decoupling the training objective from density estimation, FORT is less susceptible to mode collapse and stagnation often seen in MLE-based flows on challenging, multimodal data.
Retention of Exact Likelihoods: At inference, an FORT-trained flow computes exact likelihoods via the change-of-variable formula, unlike shortcut-based or consistency models, keeping the generative model’s statistical interpretability and utility intact.
Broader Applicability: Since FORT does not require backward or global objective information, training is possible in settings where gradients are ill-defined or difficult to compute, as in some scientific and molecular applications.

4. Empirical Demonstrations

FORT has been empirically validated on tasks central to scientific computing, particularly equilibrium conformational sampling in molecular systems (such as alanine dipeptide, tripeptide, and tetrapeptide):

On these challenging benchmarks, FORT-trained normalizing flows consistently outperform MLE-trained counterparts in Wasserstein distance and mode coverage. For instance, in alanine dipeptide, neural spline flow (NSF) trained with FORT on 10.4 million reflow targets achieves a Wasserstein-1 energy distance of 0.519, whereas the MLE baseline is 13.80.
FORT is able to "rescue" architectures—such as RealNVP or invertible transformers—that completely fail under MLE, demonstrating robust mode coverage and faithful reproduction of physical distributions.
Stability under training is also improved. FORT allows for consistent scaling to larger molecules and higher sample counts, while MLE training often becomes numerically unstable or exhibits catastrophic mode loss.
FORT models support one-step sample generation with exact likelihoods, critical for high-throughput applications in computational physics and biology where simulation or iterative approaches are prohibitive.

5. Impact on Generative Modeling and Scientific Applications

The FORT regimen fundamentally expands the practical deployment of normalizing flows and invertible generative modeling:

Efficient One-Step Sampling: Enables deployment in domains where rapid and exact sampling is mandatory (e.g., physics-based molecular sampling, targeted free energy calculations).
Architectural Innovation: Removes technical barriers associated with MLE (e.g., Jacobian bottlenecks), thus facilitating experimentation with highly expressive, modern invertible architectures.
Practical Workflows in Science: Supports new workflows such as simulation-free training and conditional density estimation without resorting to simulation pipelines or hand-tuned surrogates.
Generalizability: While validated in molecular equilibrium sampling, FORT is not limited to a specific data type or architecture and is applicable to any domain with a source-target sample pairing.

6. Comparison with Other Training Regimes

A categorical comparison situates FORT among leading generative learning paradigms:

	FORT	MLE for NFs	CNFs/Flow-Matching	Shortcut/Few-Step
One-step generation	Yes	Yes	No	Yes
Exact likelihoods	Yes	Yes	No/Approximate	No
Training scalability	High	Limited	High (but slow inf.)	High
Training stability	High	Low	High	High
Applicable architectures	Broad	Limited	Broad	Broad

A plausible implication is that FORT’s design resolves several of the classic limitations of normalizing flows—stability, scalability, and expressivity—positioning it as a method of choice for modern invertible generative modeling where paired data or privileged access to matchings is available.

7. Future Directions

Current findings suggest several avenues for further development:

Algorithmic advancements for automated or adaptive target selection (potentially via learned or self-supervised couplings).
Extensions to conditional and structured generative tasks, leveraging FORT’s flexibility.
Further applications in other scientific or engineering domains, such as materials design, physics simulation, or density estimation in controlled settings.
Coupling FORT with other forward-only or biologically plausible architectures to unify invertible modeling and robust learning.

The FORT methodology, as established for normalizing flows, exemplifies a shift in regression training: from simulation- and feedback-intensive objectives to highly efficient, stable, and expressive frameworks grounded in forward-only computation and regression-based learning.

PDF Markdown Chat (Upgrade)