Understanding Entropy Coding With Asymmetric Numeral Systems (ANS): a Statistician's Perspective

Published 5 Jan 2022 in stat.ML, cs.IT, and cs.LG | (2201.01741v2)

Abstract: Entropy coding is the backbone data compression. Novel machine-learning based compression methods often use a new entropy coder called Asymmetric Numeral Systems (ANS) [Duda et al., 2015], which provides very close to optimal bitrates and simplifies [Townsend et al., 2019] advanced compression techniques such as bits-back coding. However, researchers with a background in machine learning often struggle to understand how ANS works, which prevents them from exploiting its full versatility. This paper is meant as an educational resource to make ANS more approachable by presenting it from a new perspective of latent variable models and the so-called bits-back trick. We guide the reader step by step to a complete implementation of ANS in the Python programming language, which we then generalize for more advanced use cases. We also present and empirically evaluate an open-source library of various entropy coders designed for both research and production use. Related teaching videos and problem sets are available online.

Abstract PDF Upgrade to Chat

Citations (10)

View on Semantic Scholar

Summary

The paper's main contribution is reinterpreting ANS as a probabilistic latent variable model, connecting bits-back coding with optimal compression.
It introduces a bulk/head split algorithm to manage state scaling, ensuring near-linear runtime and efficient bit packing.
Empirical benchmarks show ANS achieving bit rates within 0.1% of the theoretical minimum with significant decoding speed advantages.

Comprehensive Analysis of "Understanding Entropy Coding With Asymmetric Numeral Systems (ANS): a Statistician's Perspective" (2201.01741)

Introduction and Motivation

The paper provides a technically rigorous, pedagogically oriented exposition of Asymmetric Numeral Systems (ANS) entropy coding, positioned to be accessible for researchers in statistics and machine learning. Recognizing that compression research often divides into procedural/algorithmic and declarative/modeling subfields, the work reframes ANS through the lens of probabilistic modeling, latent variable models, and the bits-back coding trick. This statistical perspective is leveraged to elucidate the core operating principles of ANS, generalize its configuration, and empirically compare its performance and efficiency to other entropy coders within a unified open-source framework.

Theoretical Background

The initial sections formalize the lossless compression problem as the challenge of constructing a code $C$ for messages $x$ drawn from a distribution $P(x)$ , targeting minimization of the expected bit rate $\mathbb{E}_P[R_C(x)]$ under the constraint of unique decodability. Standard theoretical bounds are established from Shannon's source coding theorem:

For any $C$ , $\mathbb{E}_P[R_C(x)] \geq H_P[x]$
There exists $C$ such that $\mathbb{E}_P[R_C(x)] < H_P[x]+1$ , and for the Shannon code, this bound holds message-wise.

The paper distinguishes between symbol codes (e.g., Huffman) that allocate a variable-length, integer number of bits to each symbol and stream codes (e.g., Arithmetic Coding, Range Coding, ANS) that amortize bit allocation over sequences, resulting in lower per-symbol overhead when entropy is low.

Asymmetric Numeral Systems: Statistical Perspective

Instead of the more familiar queue-based stream codes (Arithmetic/Range), ANS is a stack-based system with implications for model-class alignment (favoring e.g. bits-back or variational inference-based compression), computational performance, and algorithmic simplicity.

The pedagogical core is the demonstration that ANS is naturally derived when viewing the coding process as inference and generation in a latent variable model. Arbitrary (non-uniform) symbol probability models $P_i$ are quantized to $Q_i$ using integer counts $m_i(x_i)$ over a common normalization $n=2^{precision}$ . The latent variable representation partitions $[0, n-1]$ into subranges per symbol, connecting stream coding to variable-base representations in positional numeral systems.

A fundamental insight is the role of the bits-back trick, which allows for lossy reuse of coded bits by coupling the encode/decode process so that nearly all available entropy is utilized, reaching bit rates within tenths or hundredths of a percent of the source entropy.

Efficient Streaming ANS Algorithm

The practical bottleneck in naïve ANS implementations—scaling of internal state with message length— is addressed via a bulk/head split: a small fixed-size integer (the head) is manipulated for most coding operations, and only when head overflows/underflows does the system transfer bits to/from a buffer (bulk). This enables near-linear runtime in message length while maintaining optimal compression rates. Code implementations in Python throughout serve as didactic reference, but the principles readily transfer to high-performance compiled languages.

The invariants (head capacity, transfer thresholds) required to guarantee correctness (injective, uniquely decodable mapping) and efficient bit packing are carefully formalized and proven in the appendix.

Variations, Extensions, and Non-Standard Use Cases

The modular structure of ANS enables:

Generalization to arbitrary streaming configurations (word size, head capacity), controlling memory alignment and fine-tuning performance trade-offs.
Random-access decoding through checkpointing and replay of internal state ("seekable" coders).
Isolation of model-parameter and data-entropy interactions to resolve discontinuities and non-local effects in bits-back and end-to-end differentiable coding pipelines. Here, a variant using independent stacks avoids ripple effects from entropy model changes.

These extensions solidify ANS's utility as a substrate for complex, potentially learned or adaptive, probabilistic compression systems.

Empirical Benchmarks and Software Infrastructure

Empirical results are measured using a new open-source library, constriction, in both Python and Rust, enabling direct comparison between ANS, Range Coding (RC), and Arithmetic Coding (AC). Comprehensive benchmarks on real-world probabilistic model parameters reveal:

ANS and RC achieve bitrates within 0.1% of the theoretical minimum, even for highly skewed symbol distributions.
Decoding with ANS is substantially faster than RC and much faster than AC. Encoding speed favors RC.
AC lags in both runtime and, on some hardware and entropy regimes, in bit efficiency due to suboptimal alignment with word-based architectures.
The choice between ANS and RC should be guided by fit to model architecture (stack/queue semantics for latent/autoregressive models, respectively), not compression rate or runtime concerns.

These findings are summarized succinctly in tabular and graphical form, with nuanced analysis of overhead sources (approximation error, Benford's law, initial state). The software aims to standardize and accelerate research in statistical compression, lowering the barrier for empirically sound and production-ready deployments.

Implications and Future Directions

From a theoretical perspective, the statistical framing clarifies the role of ANS as an algorithmic realization of probabilistic coding optimality, specifically facilitating bits-back and variational coding techniques that are foundational in modern neural compression. Practically, the modularity and efficiency of ANS suggest its increasing adoption in lossless/lossy neural compression frameworks, particularly those leveraging latent variable generative models. The outlined advances—random access, model continuity, and flexible configuration—anticipate further integration of coder internals into learned end-to-end optimization pipelines. The standardization provided by central libraries like constriction can be expected to solidify reproducible, cross-domain empirical benchmarking.

Conclusion

The paper systematically demystifies ANS entropy coding for the modeling community, grounding algorithmic mechanisms in latent variable modeling and bits-back coding. Through theoretical, algorithmic, and empirical analysis, it establishes ANS as a broadly applicable, efficient, and highly adaptable stream coder—well-suited as a foundational component for emerging machine learning-driven compression research and applications. The software artifacts and empirical guidance provided are likely to catalyze a closer integration between statistical modeling and systems-level compression research.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Glossary

off on

Practical Applications

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

Authors (1)

Robert Bamler

Understanding Entropy Coding With Asymmetric Numeral Systems (ANS): a Statistician's Perspective

Summary

Comprehensive Analysis of "Understanding Entropy Coding With Asymmetric Numeral Systems (ANS): a Statistician's Perspective" (2201.01741)

Introduction and Motivation

Theoretical Background

Asymmetric Numeral Systems: Statistical Perspective

Efficient Streaming ANS Algorithm

Variations, Extensions, and Non-Standard Use Cases

Empirical Benchmarks and Software Infrastructure

Implications and Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (1)

Collections