Papers
Topics
Authors
Recent
2000 character limit reached

Integer Discrete Flows (IDFs)

Updated 25 November 2025
  • Integer Discrete Flows are bijective, invertible transformations defined on integer lattices that enable exact log-likelihood evaluation and lossless compression without quantization artifacts.
  • They utilize integer-valued coupling layers with neural network parameterizations and techniques like additive shifts and modular arithmetic to effectively model discrete data.
  • Recent architectures, including IDF++, enhance expressivity and hardware efficiency through methods such as ReZero initialization, GroupNorm, and integer-only operations like INT8 implementations.

An Integer Discrete Flow (IDF) is a bijective, invertible transformation fθ:ZDZDf_\theta:\mathbb{Z}^D\rightarrow\mathbb{Z}^D parameterized for learning rich probability models over high-dimensional integer-valued random variables. Unlike normalizing flows for continuous data, IDFs work natively on the integer lattice, enabling exact log-likelihoods, invertible inference, and lossless compression without real-to-integer quantization artifacts. Architecturally, IDFs stack integer-valued coupling layers (often bipartite), where each layer performs an integer-only, learnable, invertible transformation such as additive shift with neural network parameterization. Their density evaluation, training, and compression procedures are tailored to the discrete structure, with practical advantages in both generative modeling and entropy coding. Modern IDF architectures incorporate improvements in expressivity, gradient estimation, and hardware efficiency, resulting in state-of-the-art lossless compression for image datasets and competitive density modeling performance.

1. Mathematical Definition and Change-of-Variables

At the core of the IDF framework is the composition of LL integer-valued bijective layers,

fθ=fθL(L)fθL1(L1)...fθ1(1),f_\theta = f^{(L)}_{\theta_L} \circ f^{(L-1)}_{\theta_{L-1}} \circ ... \circ f^{(1)}_{\theta_1}\,,

with xZDx\in\mathbb{Z}^D mapped to z=fθ(x)ZDz=f_\theta(x)\in\mathbb{Z}^D. The discrete change-of-variables formula omits the usual Jacobian determinant, as there is no notion of infinitesimal volume in discrete space:

pX(x)=pZ(z)=pZ(fθ(x)),z=fθ(x),p_X(x) = p_Z(z) = p_Z(f_\theta(x)),\qquad z=f_\theta(x)\,,

where pZp_Z is typically a simple factorized or mixture prior on ZD\mathbb{Z}^D. Log-likelihood is thus

logpX(x)=logpZ(fθ(x)).\log p_X(x) = \log p_Z(f_\theta(x))\,.

The invertibility of fθf_\theta ensures exact one-to-one mapping between data and latent codes, allowing both density evaluation and invertible sample generation (Hoogeboom et al., 2019, Berg et al., 2020, Tomczak, 2020).

2. Core Layer Constructions: Coupling and Permutation

The canonical IDF building block is the integer discrete bipartite coupling layer. Splitting an input vector xx into x=[xa,xb]x=[x_a, x_b], the forward pass is

ya=xa,yb=xb+tθ(xa),y_a = x_a,\qquad y_b = x_b + \lfloor t_\theta(x_a)\rceil\,,

where tθ:ZdaRdbt_\theta:\mathbb{Z}^{d_a}\rightarrow\mathbb{R}^{d_b} is typically a neural network and \lfloor\cdot\rceil denotes rounding to nearest integer. The inverse recovers xx exactly:

xa=ya,xb=ybtθ(ya).x_a = y_a,\qquad x_b = y_b - \lfloor t_\theta(y_a)\rceil\,.

Alternating with channel permutations ensures all components eventually interact. Crucially, all operations are closed on ZD\mathbb{Z}^D and preserve bijectivity.

Recent IDF variants introduce more expressive multiway coupling, e.g., 4- and 8-part splits, where each output block yiy_i is updated via a dedicated neural net, enabling richer dependency structures per layer at constant parameter budget (Tomczak, 2020).

For sequence or categorical data on {0,...,K1}D\{0,...,K-1\}^D, modular arithmetic coupling is used:

yd=[μd(y<d)+σd(y<d)xd]modKy_d = [\mu_d(y_{<d}) + \sigma_d(y_{<d}) x_d] \bmod K

with modular inverse for the backward pass (Tran et al., 2019).

3. Training, Gradient Estimation, and Flexibility

IDFs admit exact maximum-likelihood training: the negative log-likelihood objective is minimized by backpropagation, using the discrete change-of-variables formula. The main technical challenge is the nondifferentiability of rounding:

  • Forward: u=tθu = \lfloor t_\theta\rfloor;
  • Backward: straight-through estimator L/t:=L/u\partial L/\partial t := \partial L/\partial u.

Empirical analysis demonstrates that, despite this bias, the straight-through gradient aligns well with finite-difference estimates in 8–16 bit image data; architecture depth and network conditioning are important for optimization stability (Berg et al., 2020).

A previously raised concern was reduced expressivity relative to continuous flows. This is refuted in the IDF++ analysis, which proves that embedding finite-support data in ZD\mathbb{Z}^D allows IDFs to exactly factorize (flatten) or transform any joint distribution via successive translation and permutation layers (Berg et al., 2020).

4. Lossless Compression and Entropy Coding

IDFs naturally support lossless compression by mapping data xx to latent code z=fθ(x)z=f_\theta(x) and encoding zz under pZ(z)p_Z(z) (via rANS or arithmetic coding). Decoding reverses the process exactly using fθ1f_\theta^{-1}. The strictly integer nature of all transformations guarantees no information loss and enables achieving bitrates competitive with or better than established codecs. The entire model is thus a learned, invertible statistical compressor with exact recovery (Hoogeboom et al., 2019, Berg et al., 2020, Wang et al., 2022).

Empirical results for IDF-based compressors on CIFAR-10 and ImageNet are included in Table 1:

Model CIFAR-10 (bpd) ImageNet32 (bpd) ImageNet64 (bpd)
PNG 5.87 6.39 5.71
FLIF 4.19 4.52 4.19
Bit-Swap 3.82 4.50
IDF 3.32 4.18 3.90
IDF++ (8 f/l) 3.26 4.12 3.81
IODF (INT8+pruned) 3.979 3.695

5. Architectural Improvements and Hardware Efficiency

IDF++ introduces several architectural optimizations:

  • Inverted channel permutations after each coupling to preserve spatial coherence.
  • ReZero-style identity initialization: yb=xb+αtθ(xa)y_b = x_b + \lfloor \alpha\cdot t_\theta(x_a)\rceil with learnable α\alpha initialized at zero.
  • GroupNorm and Swish activations inside dense blocks.
  • Zero initialization for logistic-mixture parameters in the conditional prior (Berg et al., 2020).

Hardware-friendly implementations such as Integer-Only Discrete Flows (IODF) replace floating-point subnetworks with integer-only ResNet blocks and add learnable binary gates for aggressive channel pruning. All convolutions and ReLUs are performed in INT8, producing an order-of-magnitude reduction in inference latency with minimal rate loss (Wang et al., 2022).

6. Alternative Flow Structures and Theoretical Relations

Alternate IDF architectures include:

  • Discrete autoregressive and bipartite flows, with modular affine coupling for categorical data, supporting bidirectional context (DAF) and parallel non-autoregressive sampling (DBF) (Tran et al., 2019).
  • Multi-way integer couplings, unifying reversible logic gates, discrete NVP layers, and general neural network couplings under a broad class of invertible templates (Tomczak, 2020).
  • Measure-preserving and discrete (MAD) maps constructed from CDF-inverse actions on Zd\mathbb{Z}^d, as in MAD Mix, yielding unbiased variational flows that are ergodic and parameter-free aside from mixture depth (Diluvi et al., 2023).

7. Applications, Extensions, and Empirical Observations

IDFs provide:

  • Exact likelihoods on integer and categorical data without dequantization;
  • Native and invertible image compression with state-of-the-art rates;
  • Competitive performance in character-level language modeling, synthetic joint distributions, Potts models, and permutation-based data;
  • Modular hybridization with continuous flows (e.g., interleaving integer and continuous coupling blocks), expanding applicability to mixed discrete/continuous domains (Tran et al., 2019, Tomczak, 2020, Argouarc'h et al., 2022).

Limitations include training challenges with large alphabet sizes (K>200K>200), gradient bias from the straight-through estimator in extremely low-bit regimes, and sequential dependencies in autoregressive flow architectures.

References

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Integer Discrete Flows (IDFs).