DeepCABAC: Adaptive Compression for DNNs
- DeepCABAC is a compression algorithm for deep neural networks that integrates adaptive CABAC coding with rate–distortion-aware quantization to balance model size and accuracy.
- It interleaves quantization and entropy coding by binarizing weights into significance, sign, and absolute flags, achieving up to 63.6× compression on models like VGG16.
- The method employs grid search over quantization parameters and context adaptation to optimize storage efficiency and inference precision across diverse DNN architectures.
DeepCABAC is a universal and adaptive compression algorithm for deep neural networks (DNNs) that integrates a rate–distortion–aware quantization scheme with Context-adaptive Binary Arithmetic Coding (CABAC) derived from the H.264/AVC and H.265/HEVC video coding standards. Unlike prior approaches that decouple quantization and coding, DeepCABAC interleaves the quantization and entropy coding steps, explicitly minimizing a joint objective balancing model bit-size against inference accuracy. This method achieves state-of-the-art compression factors, enabling, for example, 63.6-fold compression of the VGG16 ImageNet model (from 553MB to 8.7MB) with no loss in top-1 accuracy (Wiedemann et al., 2019, Wiedemann et al., 2019).
1. Algorithmic Foundations
DeepCABAC employs CABAC, a backward-adaptive binary arithmetic coder, for compressing quantized neural network weights. CABAC processes input symbols through (a) binarization into a sequence of decision bits (“bins”), (b) fast context modeling via two-state probability models updated online, and (c) binary arithmetic coding to approach the conditional entropy limit. In DeepCABAC, each real-valued network parameter is first quantized into an integer index, then binarized into the following sequentially coded bins: a significance flag (sigFlag), sign flag (signFlag), a sequence of “absolute-greater-than” flags (AbsGr(n)Flags), and—if applicable—a remainder coded with an Exponential-Golomb routine. The same context adaptation and binarization structure is applied during decoding, ensuring decoder resilience and adaptive probability matching (Wiedemann et al., 2019).
The key property of CABAC leveraged here is the automatic adaptation of context models across streamed weights, which allows for highly efficient coding without explicit transmission of probability tables or side-channel metadata.
2. Joint Quantization and Rate–Distortion Optimization
DeepCABAC departs from alternatives such as Lloyd-Max or uniform quantization. Instead, it uses a quantizer that explicitly seeks to minimize
where is the expected bit-length under CABAC for the quantized weights and is a quadratic approximation to accuracy loss weighted by the Fisher Information Matrix (FIM) diagonal:
for each weight estimated through variational Bayes. The Lagrange parameter allows control over the trade-off between compactness and precision.
Assignment proceeds by, for each floating-point parameter , finding the quantization index that minimizes
where is the encoder-side bit-length for representing quantized value at position , updated with exact CABAC context statistics as weights are scanned in order. This optimization does not rely on iterative re-centering (as in Lloyd’s algorithm) but on direct assignment backed by the streaming consumption and encoding order.
Quantization points are selected as multiples of a step-size :
with set based on optimization over the parameter range; the selection of and is performed via coarse-to-fine grid search.
Two variants exist:
- DC-v1: Per-weight FIM diagonals via variational Bayes; is tied to minimal estimated .
- DC-v2: Nearest-neighbor presearch under to find candidates, with .
3. Entropy Coding Pipeline and Implementation
The DeepCABAC binarization and coding pipeline handles each (potentially sparse) quantized model as follows:
- Binarization: For each quantized integer index , encode successively:
- indicating zero vs. nonzero.
- If nonzero, bit.
- Up to for .
- For , encode the remainder using an Exponential-Golomb code (context-coded unary prefix, then a fixed-length binary suffix in bypass mode).
Context modeling: Each bin associated with its own context, with quick adaptation following local statistics.
- Arithmetic coding: Bins are encoded with CABAC, achieving redundancy relative to entropy below 2 bits per symbol sequence.
- Decoder: Repeats the context updates, parses the bitstream, reconstructs each flag, and recovers the quantization index and, hence, weight value.
The implementation leverages high-throughput CABAC engines from video coding, enabling several hundreds of MB/s throughput.
4. Experimental Evaluation
DeepCABAC was comprehensively evaluated on both dense and sparsified models, including VGG16, ResNet-50, MobileNet-v1, Small-VGG16, LeNet-5, LeNet-300-100, and FCAE. Key results, using DC-v2 and ≤0.5% accuracy drop (often zero), are below:
| Model | Orig. Size | Compression Factor | Post-Q Accuracy |
|---|---|---|---|
| VGG16 | 553MB | 63.6× (8.7MB) | 69.43% (no loss) |
| ResNet-50 | 102MB | 16.8× (6.07MB) | 74.12% (−2.01%) |
| MobileNet-v1 | 17MB | 7.9× (2.15MB) | 66.18% (−4.51%) |
| LeNet-5 | 1.7MB | 138× (0.012MB) | 99.16% (−0.06%) |
Sparse models, obtained with iterative pruning or variational sparsification, saw up to 160× compression (VGG16) and typically around 50× on average. In all cases, DeepCABAC outperformed traditional pipelines consisting of quantization plus bzip2 or Huffman coding. On LeNet-5, DeepCABAC achieved 1.48 bits/weight at 99.23% accuracy, compared to 1.79 bits/weight under a weighted Lloyd quantizer for the same accuracy (Wiedemann et al., 2019, Wiedemann et al., 2019).
5. Hyperparameters, Boundary Cases, and Efficiency
Key parameters influencing performance are the number of AbsGr(n) flags (e.g., in DC-v2) and the grids over and . Very small weights automatically pool at zero, requiring only 1 bit per occurrence (the sigFlag). CABAC’s context adaptation enables efficient coding across non-uniform sparsity and value distributions, and the algorithm’s integration ensures minimal overhead for probability transmission.
Quantization assignment is for weights and quantization points (practically, about $10K$ steps per parameter), but this is a one-shot, offline process, and compress-decompress runtimes approach those of video codecs.
6. Implementation Notes, Limitations, and Outlook
DeepCABAC is open-source, with C++ and Python implementations available at https://github.com/fraunhoferhhi/DeepCABAC. It assumes use of an existing CABAC engine and tensor scanner (such as Eigen). No retraining or fine-tuning is performed post-quantization; in highly accuracy-sensitive models, this constraint can yield minor accuracy degradation that retraining might ameliorate.
The grid search over can be computationally costly for large-scale models; coarse-to-fine scheduling mitigates this. DC-v1 requires a per-weight FIM diagonal pass (via variational Bayes), which is computationally non-trivial. Model activations and feature maps are not handled by DeepCABAC. Efficient compressed-domain inference remains an open avenue for future research.
In summary, DeepCABAC achieves state-of-the-art compression rates for DNN model parameters by tightly coupling adaptive entropy coding with rate–distortion–aware quantization. Its integration of a backward-adaptive binary arithmetic coder with a quantizer sensitive to information-theoretic and task loss metrics distinguishes it from prior approaches built on separated quantization and coding stages (Wiedemann et al., 2019, Wiedemann et al., 2019).