Integer Discrete Flows (IDFs)

Updated 25 November 2025

Integer Discrete Flows are bijective, invertible transformations defined on integer lattices that enable exact log-likelihood evaluation and lossless compression without quantization artifacts.
They utilize integer-valued coupling layers with neural network parameterizations and techniques like additive shifts and modular arithmetic to effectively model discrete data.
Recent architectures, including IDF++, enhance expressivity and hardware efficiency through methods such as ReZero initialization, GroupNorm, and integer-only operations like INT8 implementations.

An Integer Discrete Flow (IDF) is a bijective, invertible transformation $f_\theta:\mathbb{Z}^D\rightarrow\mathbb{Z}^D$ parameterized for learning rich probability models over high-dimensional integer-valued random variables. Unlike normalizing flows for continuous data, IDFs work natively on the integer lattice, enabling exact log-likelihoods, invertible inference, and lossless compression without real-to-integer quantization artifacts. Architecturally, IDFs stack integer-valued coupling layers (often bipartite), where each layer performs an integer-only, learnable, invertible transformation such as additive shift with neural network parameterization. Their density evaluation, training, and compression procedures are tailored to the discrete structure, with practical advantages in both generative modeling and entropy coding. Modern IDF architectures incorporate improvements in expressivity, gradient estimation, and hardware efficiency, resulting in state-of-the-art lossless compression for image datasets and competitive density modeling performance.

1. Mathematical Definition and Change-of-Variables

At the core of the IDF framework is the composition of $L$ integer-valued bijective layers,

$f_\theta = f^{(L)}_{\theta_L} \circ f^{(L-1)}_{\theta_{L-1}} \circ ... \circ f^{(1)}_{\theta_1}\,,$

with $x\in\mathbb{Z}^D$ mapped to $z=f_\theta(x)\in\mathbb{Z}^D$ . The discrete change-of-variables formula omits the usual Jacobian determinant, as there is no notion of infinitesimal volume in discrete space:

$p_X(x) = p_Z(z) = p_Z(f_\theta(x)),\qquad z=f_\theta(x)\,,$

where $p_Z$ is typically a simple factorized or mixture prior on $\mathbb{Z}^D$ . Log-likelihood is thus

$\log p_X(x) = \log p_Z(f_\theta(x))\,.$

The invertibility of $f_\theta$ ensures exact one-to-one mapping between data and latent codes, allowing both density evaluation and invertible sample generation (Hoogeboom et al., 2019, Berg et al., 2020, Tomczak, 2020).

2. Core Layer Constructions: Coupling and Permutation

The canonical IDF building block is the integer discrete bipartite coupling layer. Splitting an input vector $x$ into $x=[x_a, x_b]$ , the forward pass is

$y_a = x_a,\qquad y_b = x_b + \lfloor t_\theta(x_a)\rceil\,,$

where $t_\theta:\mathbb{Z}^{d_a}\rightarrow\mathbb{R}^{d_b}$ is typically a neural network and $\lfloor\cdot\rceil$ denotes rounding to nearest integer. The inverse recovers $x$ exactly:

$x_a = y_a,\qquad x_b = y_b - \lfloor t_\theta(y_a)\rceil\,.$

Alternating with channel permutations ensures all components eventually interact. Crucially, all operations are closed on $\mathbb{Z}^D$ and preserve bijectivity.

Recent IDF variants introduce more expressive multiway coupling, e.g., 4- and 8-part splits, where each output block $y_i$ is updated via a dedicated neural net, enabling richer dependency structures per layer at constant parameter budget (Tomczak, 2020).

For sequence or categorical data on $\{0,...,K-1\}^D$ , modular arithmetic coupling is used:

$y_d = [\mu_d(y_{<d}) + \sigma_d(y_{<d}) x_d] \bmod K$

with modular inverse for the backward pass (Tran et al., 2019).

3. Training, Gradient Estimation, and Flexibility

IDFs admit exact maximum-likelihood training: the negative log-likelihood objective is minimized by backpropagation, using the discrete change-of-variables formula. The main technical challenge is the nondifferentiability of rounding:

Forward: $u = \lfloor t_\theta\rfloor$ ;
Backward: straight-through estimator $\partial L/\partial t := \partial L/\partial u$ .

Empirical analysis demonstrates that, despite this bias, the straight-through gradient aligns well with finite-difference estimates in 8–16 bit image data; architecture depth and network conditioning are important for optimization stability (Berg et al., 2020).

A previously raised concern was reduced expressivity relative to continuous flows. This is refuted in the IDF++ analysis, which proves that embedding finite-support data in $\mathbb{Z}^D$ allows IDFs to exactly factorize (flatten) or transform any joint distribution via successive translation and permutation layers (Berg et al., 2020).

4. Lossless Compression and Entropy Coding

IDFs naturally support lossless compression by mapping data $x$ to latent code $z=f_\theta(x)$ and encoding $z$ under $p_Z(z)$ (via rANS or arithmetic coding). Decoding reverses the process exactly using $f_\theta^{-1}$ . The strictly integer nature of all transformations guarantees no information loss and enables achieving bitrates competitive with or better than established codecs. The entire model is thus a learned, invertible statistical compressor with exact recovery (Hoogeboom et al., 2019, Berg et al., 2020, Wang et al., 2022).

Empirical results for IDF-based compressors on CIFAR-10 and ImageNet are included in Table 1:

Model	CIFAR-10 (bpd)	ImageNet32 (bpd)	ImageNet64 (bpd)
PNG	5.87	6.39	5.71
FLIF	4.19	4.52	4.19
Bit-Swap	3.82	4.50	—
IDF	3.32	4.18	3.90
IDF++ (8 f/l)	3.26	4.12	3.81
IODF (INT8+pruned)	3.979	3.695	—

5. Architectural Improvements and Hardware Efficiency

IDF++ introduces several architectural optimizations:

Inverted channel permutations after each coupling to preserve spatial coherence.
ReZero-style identity initialization: $y_b = x_b + \lfloor \alpha\cdot t_\theta(x_a)\rceil$ with learnable $\alpha$ initialized at zero.
GroupNorm and Swish activations inside dense blocks.
Zero initialization for logistic-mixture parameters in the conditional prior (Berg et al., 2020).

Hardware-friendly implementations such as Integer-Only Discrete Flows (IODF) replace floating-point subnetworks with integer-only ResNet blocks and add learnable binary gates for aggressive channel pruning. All convolutions and ReLUs are performed in INT8, producing an order-of-magnitude reduction in inference latency with minimal rate loss (Wang et al., 2022).

6. Alternative Flow Structures and Theoretical Relations

Alternate IDF architectures include:

Discrete autoregressive and bipartite flows, with modular affine coupling for categorical data, supporting bidirectional context (DAF) and parallel non-autoregressive sampling (DBF) (Tran et al., 2019).
Multi-way integer couplings, unifying reversible logic gates, discrete NVP layers, and general neural network couplings under a broad class of invertible templates (Tomczak, 2020).
Measure-preserving and discrete (MAD) maps constructed from CDF-inverse actions on $\mathbb{Z}^d$ , as in MAD Mix, yielding unbiased variational flows that are ergodic and parameter-free aside from mixture depth (Diluvi et al., 2023).

7. Applications, Extensions, and Empirical Observations

IDFs provide:

Exact likelihoods on integer and categorical data without dequantization;
Native and invertible image compression with state-of-the-art rates;
Competitive performance in character-level language modeling, synthetic joint distributions, Potts models, and permutation-based data;
Modular hybridization with continuous flows (e.g., interleaving integer and continuous coupling blocks), expanding applicability to mixed discrete/continuous domains (Tran et al., 2019, Tomczak, 2020, Argouarc'h et al., 2022).

Limitations include training challenges with large alphabet sizes ( $K>200$ ), gradient bias from the straight-through estimator in extremely low-bit regimes, and sequential dependencies in autoregressive flow architectures.

References

"Integer Discrete Flows and Lossless Compression" (Hoogeboom et al., 2019)
"IDF++: Analyzing and Improving Integer Discrete Flows for Lossless Compression" (Berg et al., 2020)
"Discrete Flows: Invertible Generative Models of Discrete Data" (Tran et al., 2019)
"General Invertible Transformations for Flow-based Generative Modeling" (Tomczak, 2020)
"Fast Lossless Neural Compression with Integer-Only Discrete Flows" (Wang et al., 2022)
"Mixed Variational Flows for Discrete Variables" (Diluvi et al., 2023)
"Discretely Indexed Flows" (Argouarc'h et al., 2022)