Bidirectional Normalizing Flow (BiFlow)
- Bidirectional Normalizing Flow (BiFlow) is a generative modeling framework that decouples forward and reverse processes by learning an approximate inverse mapping.
- It leverages advanced transformer architectures and coupling blocks to achieve efficient, single-pass sampling and reduced computational complexity.
- BiFlow demonstrates state-of-the-art performance in large-scale image synthesis and semi-supervised anomaly detection with significant speedups over traditional methods.
A Bidirectional Normalizing Flow (BiFlow) is a generative modeling framework that extends classical normalizing flows by decoupling the forward and reverse processes. Unlike standard NFs, which require the reverse transformation to be the exact analytic inverse of the forward mapping, BiFlow learns an approximate reverse model, thereby enabling more flexible architecture designs and accelerating sampling. BiFlow has demonstrated state-of-the-art generation quality and efficiency on large-scale image synthesis tasks and has enabled new semi-supervised approaches to anomaly detection in network traffic (Lu et al., 11 Dec 2025, Dang et al., 13 Mar 2024).
1. Mathematical Foundations
BiFlow builds upon the theory of Normalizing Flows (NFs), which construct a bijection via composition of simple invertible functions: . The log-density under the model is evaluated through the change-of-variables formula: where and .
BiFlow alters this paradigm by introducing an independently learned reverse mapping that approximates the inverse but is not constrained to be perfectly invertible. The forward process and the learned reverse enable maximum-likelihood training on the forward pass and supervised hidden-state alignment on the reverse pass, removing Jacobian constraints on . This generalizes to domains beyond image synthesis, such as anomaly detection (Dang et al., 13 Mar 2024), where BiFlow constructs a bijection in latent space for normal traffic data:
and log-density computations follow standard NF formulations.
2. Training Objectives
Forward (Data to Noise)
The forward NF is optimized by maximum likelihood over data samples : In network anomaly detection scenarios, is trained only on normal latent representations (Dang et al., 13 Mar 2024) with affine-coupling blocks.
Reverse (Noise to Data)
Upon freezing the forward model, BiFlow optimizes the reverse by aligning its internal hidden states with the forward trajectories :
where may combine MSE and perceptual distances (e.g., LPIPS-VGG, ConvNeXt-V2). The total reverse objective is (Lu et al., 11 Dec 2025).
Adaptive Weighting and Norm Control
Adaptive-weighted MSE terms reweight errors via , smoothing learning dynamics. Intermediate-state outputs in the forward flow are clipped to , while reverse states are RMS-normalized prior to alignment (Lu et al., 11 Dec 2025).
3. Architecture Design
Forward Model
Image synthesis deployments use improved TARFlow (iTARFlow) variants—autoregressive flows built from Transformer blocks. Each block alternates self-attention directions to realize bidirectional context; the Jacobian remains tractable due to autoregressive masking (Lu et al., 11 Dec 2025). In anomaly detection, BiFlow employs stacks of RealNVP-style affine-coupling blocks with triangular Jacobian structures (Dang et al., 13 Mar 2024).
Reverse Model
The learned inverse in BiFlow is a feedforward Vision Transformer (ViT) of depth , where each block applies non-causal multi-headed attention, RMSNorm, residual connections, and projection heads. The final block performs denoising for direct reconstruction, eliminating score-based steps typical in autoregressive flows.
Classifier-free guidance is embedded at training time via the CFG trick: enabling single-pass guided sampling (Lu et al., 11 Dec 2025).
The core algorithmic steps are summarized below:
| Procedure | Description | Key Steps |
|---|---|---|
| Training | Perturb , compute forward states , reverse states , align projections | Loss over all |
| 1-NFE Sampling | Sample , return | Single forward pass |
4. Sampling Complexity and Efficiency
Classical TARFlow sampling involves sequential autoregressive steps and supplemental score-based denoising, incurring heavy computational demand. BiFlow achieves sampling in a single non-causal, parallel transformer pass (1-NFE). Empirical benchmarks denote significant efficiency improvements:
- BiFlow-B/2 samples and decodes in $0.29$ ms ms on 8TPU-v4
- iTARFlow-B/2: $65$ ms ms (yielding a speedup)
- Larger configurations reach up to (TPU) or (CPU) acceleration over previous NF architectures (Lu et al., 11 Dec 2025).
In anomaly detection, BiFlow's inference cost sums to $3.91$M parameters and $0.02$GFLOPs, outperforming comparable flows and GAN-based approaches in model size and computational cost (Dang et al., 13 Mar 2024).
5. Empirical Performance and Applications
Image Synthesis
Key metrics on ImageNet :
- BiFlow-B/2 (learned inverse): FID = $2.39$, IS = $303.0$
- iTARFlow-B/2 (exact inverse): FID = $6.83$, IS = $226.2$
- With ConvNeXt-V2 perceptual loss and CFG, FID = $2.46$ (Lu et al., 11 Dec 2025)
BiFlow sets new state-of-the-art in NF-based synthesis and compares favorably with single-evaluation (1-NFE) diffusion/flow-matching models at substantially lower compute.
| Model | Params (M) | FID | IS |
|---|---|---|---|
| BiFlow-B/2 (1-NFE) | 133 | 2.39 | 303.0 |
| STARFlow-XL/1 | 1400 | 2.40 | – |
| MeanFlow-XL/2 | 676 | 3.43 | – |
Anomaly Detection
BiFlow forms a core module in a three-stage semi-supervised anomaly traffic detection pipeline:
- GAN-style autoencoder trains on normal samples.
- BiFlow normalizes latent representations to via an 8-block coupling network.
- Perturbations in normalized space yield pseudo anomalies, used to train a classifier achieving AUROC up to $0.8658$ on VPN/non-VPN detection (Dang et al., 13 Mar 2024).
6. Theoretical Insights and Stability Mechanisms
BiFlow's hidden-alignment strategy supervises the reverse transformer using all intermediate forward states, allowing for flexible representation at each block and eliminating repeated projections into data space. This has empirically reduced reconstruction losses and improved fidelity compared to naive or hidden-distillation strategies.
Stability is reinforced by norm control—clipping forward model outputs and RMS normalization—preventing exploding norms and balancing MSE scales. Adaptive-weighted losses mitigate gradient instabilities from large errors (Lu et al., 11 Dec 2025). Integrated perceptual losses (LPIPS-VGG, ConvNeXt-V2) serve as regularizers by ensuring generated samples remain on realistic data manifolds.
7. Significance and Extension
BiFlow redefines the normalizing flow paradigm by removing the requirement of analytic invertibility, substituting a learned transformer-based reverse mapping. This innovation enables dramatic improvements in sampling speed, architectural flexibility, and generation quality, facilitating broader adoption in both generative modeling and discriminative semi-supervised anomaly detection. The decoupling of forward and reverse enables future work on more expressive, computationally-efficient flows and diverse applications across domains (Lu et al., 11 Dec 2025, Dang et al., 13 Mar 2024).