Bidirectional Normalizing Flow (BiFlow)

Updated 13 December 2025

Bidirectional Normalizing Flow (BiFlow) is a generative modeling framework that decouples forward and reverse processes by learning an approximate inverse mapping.
It leverages advanced transformer architectures and coupling blocks to achieve efficient, single-pass sampling and reduced computational complexity.
BiFlow demonstrates state-of-the-art performance in large-scale image synthesis and semi-supervised anomaly detection with significant speedups over traditional methods.

A Bidirectional Normalizing Flow (BiFlow) is a generative modeling framework that extends classical normalizing flows by decoupling the forward and reverse processes. Unlike standard NFs, which require the reverse transformation to be the exact analytic inverse of the forward mapping, BiFlow learns an approximate reverse model, thereby enabling more flexible architecture designs and accelerating sampling. BiFlow has demonstrated state-of-the-art generation quality and efficiency on large-scale image synthesis tasks and has enabled new semi-supervised approaches to anomaly detection in network traffic (Lu et al., 11 Dec 2025, Dang et al., 13 Mar 2024).

1. Mathematical Foundations

BiFlow builds upon the theory of Normalizing Flows (NFs), which construct a bijection $f_\theta : x \in \mathbb{R}^D \rightarrow z \in \mathbb{R}^D$ via composition of simple invertible functions: $f_\theta = f_{B−1}\circ\cdots\circ f_1\circ f_0$ . The log-density under the model is evaluated through the change-of-variables formula: $\log p_\theta(x) = \log p_0(z) + \sum_{i=0}^{B−1} \log |\,\det(\partial f_i(x^i)/\partial x^i)\,|$ where $x^0 = x$ and $x^{i+1} = f_i(x^i)$ .

BiFlow alters this paradigm by introducing an independently learned reverse mapping $G_\phi$ that approximates the inverse $F_\theta^{-1}$ but is not constrained to be perfectly invertible. The forward process $z = F_\theta(x)$ and the learned reverse $x' = G_\phi(z)$ enable maximum-likelihood training on the forward pass and supervised hidden-state alignment on the reverse pass, removing Jacobian constraints on $G_\phi$ . This generalizes to domains beyond image synthesis, such as anomaly detection (Dang et al., 13 Mar 2024), where BiFlow constructs a bijection in latent space for normal traffic data: $c = f_\theta(z),\quad c \sim \mathcal{N}(0, I)$

$z = g_\theta(c)$

and log-density computations follow standard NF formulations.

2. Training Objectives

Forward (Data to Noise)

The forward NF $F_\theta$ is optimized by maximum likelihood over data samples $x \sim p_\text{data}$ : $L_\text{forward}(\theta) = \mathbb{E}_{x}\left[ \log p_0(F_\theta(x)) + \sum_{i=0}^{B-1} \log|\det(\partial f_i(x^i)/\partial x^i)| \right]$ In network anomaly detection scenarios, $f_\theta$ is trained only on normal latent representations (Dang et al., 13 Mar 2024) with affine-coupling blocks.

Reverse (Noise to Data)

Upon freezing the forward model, BiFlow optimizes the reverse $G_\phi$ by aligning its internal hidden states $h^i$ with the forward trajectories $x^i$ : $L_\text{align}(\phi) = \mathbb{E}_{x} \sum_{i=0}^{B} D(x^i, \phi_i(h^i))$

$L_\text{recon}(\phi) = \mathbb{E}_{x} D(x^0, x')$

where $D(\cdot, \cdot)$ may combine MSE and perceptual distances (e.g., LPIPS-VGG, ConvNeXt-V2). The total reverse objective is $L_\text{reverse} = L_\text{align} + L_\text{recon}$ (Lu et al., 11 Dec 2025).

Adaptive Weighting and Norm Control

Adaptive-weighted MSE terms reweight errors via $w_p = (D(x, y) + \epsilon)^{-p}$ , smoothing learning dynamics. Intermediate-state outputs in the forward flow are clipped to $[\pm c]$ , while reverse states are RMS-normalized prior to alignment (Lu et al., 11 Dec 2025).

3. Architecture Design

Forward Model

Image synthesis deployments use improved TARFlow (iTARFlow) variants—autoregressive flows built from Transformer blocks. Each block alternates self-attention directions to realize bidirectional context; the Jacobian remains tractable due to autoregressive masking (Lu et al., 11 Dec 2025). In anomaly detection, BiFlow employs stacks of RealNVP-style affine-coupling blocks with triangular Jacobian structures (Dang et al., 13 Mar 2024).

Reverse Model

The learned inverse $G_\phi$ in BiFlow is a feedforward Vision Transformer (ViT) of depth $B+1$ , where each block applies non-causal multi-headed attention, RMSNorm, residual connections, and projection heads. The final block performs denoising for direct reconstruction, eliminating score-based steps typical in autoregressive flows.

Classifier-free guidance is embedded at training time via the CFG trick: $G^{cfg, i}_\phi(h^i|c) = (1 + w_i) G_\phi^i(h^i|c) - w_i G_\phi^i(h^i|\text{null}),$ enabling single-pass guided sampling (Lu et al., 11 Dec 2025).

The core algorithmic steps are summarized below:

Procedure	Description	Key Steps
Training	Perturb $x$ , compute forward states $(x^i, z)$ , reverse states $(h^i, x')$ , align projections	Loss over all $D(x^i, \hat{y}^i) + D(x, x')$
1-NFE Sampling	Sample $\epsilon \sim \mathcal{N}(0, I)$ , return $x = G_\phi(\epsilon)$	Single forward pass

4. Sampling Complexity and Efficiency

Classical TARFlow sampling involves $B \cdot T$ sequential autoregressive steps and supplemental score-based denoising, incurring heavy computational demand. BiFlow achieves sampling in a single non-causal, parallel transformer pass (1-NFE). Empirical benchmarks denote significant efficiency improvements:

BiFlow-B/2 samples and decodes in $0.29$ ms $+ 1.3$ ms on 8 $\times$ TPU-v4
iTARFlow-B/2: $65$ ms $+ 1.3$ ms (yielding a $224\times$ speedup)
Larger configurations reach up to $700\times$ (TPU) or $1600\times$ (CPU) acceleration over previous NF architectures (Lu et al., 11 Dec 2025).

In anomaly detection, BiFlow's inference cost sums to $3.91$M parameters and $0.02$GFLOPs, outperforming comparable flows and GAN-based approaches in model size and computational cost (Dang et al., 13 Mar 2024).

5. Empirical Performance and Applications

Image Synthesis

Key metrics on ImageNet $256\times256$ :

BiFlow-B/2 (learned inverse): FID = $2.39$, IS = $303.0$
iTARFlow-B/2 (exact inverse): FID = $6.83$, IS = $226.2$
With ConvNeXt-V2 perceptual loss and CFG, FID = $2.46$ (Lu et al., 11 Dec 2025)

BiFlow sets new state-of-the-art in NF-based synthesis and compares favorably with single-evaluation (1-NFE) diffusion/flow-matching models at substantially lower compute.

Model	Params (M)	FID	IS
BiFlow-B/2 (1-NFE)	133	2.39	303.0
STARFlow-XL/1	1400	2.40	–
MeanFlow-XL/2	676	3.43	–

Anomaly Detection

BiFlow forms a core module in a three-stage semi-supervised anomaly traffic detection pipeline:

GAN-style autoencoder trains on normal samples.
BiFlow normalizes latent representations to $\mathcal{N}(0, I)$ via an 8-block coupling network.
Perturbations in normalized space yield pseudo anomalies, used to train a classifier achieving AUROC up to $0.8658$ on VPN/non-VPN detection (Dang et al., 13 Mar 2024).

6. Theoretical Insights and Stability Mechanisms

BiFlow's hidden-alignment strategy supervises the reverse transformer using all intermediate forward states, allowing for flexible representation at each block and eliminating repeated projections into data space. This has empirically reduced reconstruction losses and improved fidelity compared to naive or hidden-distillation strategies.

Stability is reinforced by norm control—clipping forward model outputs and RMS normalization—preventing exploding norms and balancing MSE scales. Adaptive-weighted losses mitigate gradient instabilities from large errors (Lu et al., 11 Dec 2025). Integrated perceptual losses (LPIPS-VGG, ConvNeXt-V2) serve as regularizers by ensuring generated samples remain on realistic data manifolds.

7. Significance and Extension

BiFlow redefines the normalizing flow paradigm by removing the requirement of analytic invertibility, substituting a learned transformer-based reverse mapping. This innovation enables dramatic improvements in sampling speed, architectural flexibility, and generation quality, facilitating broader adoption in both generative modeling and discriminative semi-supervised anomaly detection. The decoupling of forward and reverse enables future work on more expressive, computationally-efficient flows and diverse applications across domains (Lu et al., 11 Dec 2025, Dang et al., 13 Mar 2024).

PDF Markdown Chat (Pro)

References (2)

Bidirectional Normalizing Flow: From Data to Noise and Back (2025)

Semi-Supervised Learning for Anomaly Traffic Detection via Bidirectional Normalizing Flows (2024)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Bidirectional Normalizing Flow (BiFlow).