Papers
Topics
Authors
Recent
2000 character limit reached

Bidirectional Normalizing Flow (BiFlow)

Updated 13 December 2025
  • Bidirectional Normalizing Flow (BiFlow) is a generative modeling framework that decouples forward and reverse processes by learning an approximate inverse mapping.
  • It leverages advanced transformer architectures and coupling blocks to achieve efficient, single-pass sampling and reduced computational complexity.
  • BiFlow demonstrates state-of-the-art performance in large-scale image synthesis and semi-supervised anomaly detection with significant speedups over traditional methods.

A Bidirectional Normalizing Flow (BiFlow) is a generative modeling framework that extends classical normalizing flows by decoupling the forward and reverse processes. Unlike standard NFs, which require the reverse transformation to be the exact analytic inverse of the forward mapping, BiFlow learns an approximate reverse model, thereby enabling more flexible architecture designs and accelerating sampling. BiFlow has demonstrated state-of-the-art generation quality and efficiency on large-scale image synthesis tasks and has enabled new semi-supervised approaches to anomaly detection in network traffic (Lu et al., 11 Dec 2025, Dang et al., 13 Mar 2024).

1. Mathematical Foundations

BiFlow builds upon the theory of Normalizing Flows (NFs), which construct a bijection fθ:xRDzRDf_\theta : x \in \mathbb{R}^D \rightarrow z \in \mathbb{R}^D via composition of simple invertible functions: fθ=fB1f1f0f_\theta = f_{B−1}\circ\cdots\circ f_1\circ f_0. The log-density under the model is evaluated through the change-of-variables formula: logpθ(x)=logp0(z)+i=0B1logdet(fi(xi)/xi)\log p_\theta(x) = \log p_0(z) + \sum_{i=0}^{B−1} \log |\,\det(\partial f_i(x^i)/\partial x^i)\,| where x0=xx^0 = x and xi+1=fi(xi)x^{i+1} = f_i(x^i).

BiFlow alters this paradigm by introducing an independently learned reverse mapping GϕG_\phi that approximates the inverse Fθ1F_\theta^{-1} but is not constrained to be perfectly invertible. The forward process z=Fθ(x)z = F_\theta(x) and the learned reverse x=Gϕ(z)x' = G_\phi(z) enable maximum-likelihood training on the forward pass and supervised hidden-state alignment on the reverse pass, removing Jacobian constraints on GϕG_\phi. This generalizes to domains beyond image synthesis, such as anomaly detection (Dang et al., 13 Mar 2024), where BiFlow constructs a bijection in latent space for normal traffic data: c=fθ(z),cN(0,I)c = f_\theta(z),\quad c \sim \mathcal{N}(0, I)

z=gθ(c)z = g_\theta(c)

and log-density computations follow standard NF formulations.

2. Training Objectives

Forward (Data to Noise)

The forward NF FθF_\theta is optimized by maximum likelihood over data samples xpdatax \sim p_\text{data}: Lforward(θ)=Ex[logp0(Fθ(x))+i=0B1logdet(fi(xi)/xi)]L_\text{forward}(\theta) = \mathbb{E}_{x}\left[ \log p_0(F_\theta(x)) + \sum_{i=0}^{B-1} \log|\det(\partial f_i(x^i)/\partial x^i)| \right] In network anomaly detection scenarios, fθf_\theta is trained only on normal latent representations (Dang et al., 13 Mar 2024) with affine-coupling blocks.

Reverse (Noise to Data)

Upon freezing the forward model, BiFlow optimizes the reverse GϕG_\phi by aligning its internal hidden states hih^i with the forward trajectories xix^i: Lalign(ϕ)=Exi=0BD(xi,ϕi(hi))L_\text{align}(\phi) = \mathbb{E}_{x} \sum_{i=0}^{B} D(x^i, \phi_i(h^i))

Lrecon(ϕ)=ExD(x0,x)L_\text{recon}(\phi) = \mathbb{E}_{x} D(x^0, x')

where D(,)D(\cdot, \cdot) may combine MSE and perceptual distances (e.g., LPIPS-VGG, ConvNeXt-V2). The total reverse objective is Lreverse=Lalign+LreconL_\text{reverse} = L_\text{align} + L_\text{recon} (Lu et al., 11 Dec 2025).

Adaptive Weighting and Norm Control

Adaptive-weighted MSE terms reweight errors via wp=(D(x,y)+ϵ)pw_p = (D(x, y) + \epsilon)^{-p}, smoothing learning dynamics. Intermediate-state outputs in the forward flow are clipped to [±c][\pm c], while reverse states are RMS-normalized prior to alignment (Lu et al., 11 Dec 2025).

3. Architecture Design

Forward Model

Image synthesis deployments use improved TARFlow (iTARFlow) variants—autoregressive flows built from Transformer blocks. Each block alternates self-attention directions to realize bidirectional context; the Jacobian remains tractable due to autoregressive masking (Lu et al., 11 Dec 2025). In anomaly detection, BiFlow employs stacks of RealNVP-style affine-coupling blocks with triangular Jacobian structures (Dang et al., 13 Mar 2024).

Reverse Model

The learned inverse GϕG_\phi in BiFlow is a feedforward Vision Transformer (ViT) of depth B+1B+1, where each block applies non-causal multi-headed attention, RMSNorm, residual connections, and projection heads. The final block performs denoising for direct reconstruction, eliminating score-based steps typical in autoregressive flows.

Classifier-free guidance is embedded at training time via the CFG trick: Gϕcfg,i(hic)=(1+wi)Gϕi(hic)wiGϕi(hinull),G^{cfg, i}_\phi(h^i|c) = (1 + w_i) G_\phi^i(h^i|c) - w_i G_\phi^i(h^i|\text{null}), enabling single-pass guided sampling (Lu et al., 11 Dec 2025).

The core algorithmic steps are summarized below:

Procedure Description Key Steps
Training Perturb xx, compute forward states (xi,z)(x^i, z), reverse states (hi,x)(h^i, x'), align projections Loss over all D(xi,y^i)+D(x,x)D(x^i, \hat{y}^i) + D(x, x')
1-NFE Sampling Sample ϵN(0,I)\epsilon \sim \mathcal{N}(0, I), return x=Gϕ(ϵ)x = G_\phi(\epsilon) Single forward pass

4. Sampling Complexity and Efficiency

Classical TARFlow sampling involves BTB \cdot T sequential autoregressive steps and supplemental score-based denoising, incurring heavy computational demand. BiFlow achieves sampling in a single non-causal, parallel transformer pass (1-NFE). Empirical benchmarks denote significant efficiency improvements:

  • BiFlow-B/2 samples and decodes in $0.29$ ms +1.3+ 1.3 ms on 8×\timesTPU-v4
  • iTARFlow-B/2: $65$ ms +1.3+ 1.3 ms (yielding a 224×224\times speedup)
  • Larger configurations reach up to 700×700\times (TPU) or 1600×1600\times (CPU) acceleration over previous NF architectures (Lu et al., 11 Dec 2025).

In anomaly detection, BiFlow's inference cost sums to $3.91$M parameters and $0.02$GFLOPs, outperforming comparable flows and GAN-based approaches in model size and computational cost (Dang et al., 13 Mar 2024).

5. Empirical Performance and Applications

Image Synthesis

Key metrics on ImageNet 256×256256\times256:

BiFlow sets new state-of-the-art in NF-based synthesis and compares favorably with single-evaluation (1-NFE) diffusion/flow-matching models at substantially lower compute.

Model Params (M) FID IS
BiFlow-B/2 (1-NFE) 133 2.39 303.0
STARFlow-XL/1 1400 2.40
MeanFlow-XL/2 676 3.43

Anomaly Detection

BiFlow forms a core module in a three-stage semi-supervised anomaly traffic detection pipeline:

  1. GAN-style autoencoder trains on normal samples.
  2. BiFlow normalizes latent representations to N(0,I)\mathcal{N}(0, I) via an 8-block coupling network.
  3. Perturbations in normalized space yield pseudo anomalies, used to train a classifier achieving AUROC up to $0.8658$ on VPN/non-VPN detection (Dang et al., 13 Mar 2024).

6. Theoretical Insights and Stability Mechanisms

BiFlow's hidden-alignment strategy supervises the reverse transformer using all intermediate forward states, allowing for flexible representation at each block and eliminating repeated projections into data space. This has empirically reduced reconstruction losses and improved fidelity compared to naive or hidden-distillation strategies.

Stability is reinforced by norm control—clipping forward model outputs and RMS normalization—preventing exploding norms and balancing MSE scales. Adaptive-weighted losses mitigate gradient instabilities from large errors (Lu et al., 11 Dec 2025). Integrated perceptual losses (LPIPS-VGG, ConvNeXt-V2) serve as regularizers by ensuring generated samples remain on realistic data manifolds.

7. Significance and Extension

BiFlow redefines the normalizing flow paradigm by removing the requirement of analytic invertibility, substituting a learned transformer-based reverse mapping. This innovation enables dramatic improvements in sampling speed, architectural flexibility, and generation quality, facilitating broader adoption in both generative modeling and discriminative semi-supervised anomaly detection. The decoupling of forward and reverse enables future work on more expressive, computationally-efficient flows and diverse applications across domains (Lu et al., 11 Dec 2025, Dang et al., 13 Mar 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Bidirectional Normalizing Flow (BiFlow).