ZFP Compression in Scientific Computing

Updated 18 April 2026

ZFP compression is a block-based, lossy floating-point compression algorithm that employs high-order transforms and bit-plane quantization to balance compression ratios with fidelity.
It supports fixed-rate, fixed-accuracy, and fixed-precision modes, enabling configurable error tolerances and maintaining performance in demanding tasks like medical imaging segmentation.
Designed for high-throughput scientific workflows, ZFP achieves significant compression ratios with tight error bounds and scalable parallel processing.

ZFP compression is a block-based, lossy floating-point compression algorithm designed for efficient, high-throughput dimensional data reduction with strict error control. Initially motivated by the demands of scientific data analysis, HPC, and medical imaging, ZFP applies high-order transforms and bit-plane quantization to provide tunable tradeoffs between compression ratio (CR) and fidelity, featuring three operational modes: fixed-rate, fixed-accuracy (error-tolerance), and fixed-precision. Recent research rigorously benchmarks ZFP’s impact on downstream tasks, provides tight error bounds, exposes error bias and presents algorithmic modifications to neutralize it, and analyzes its role in inline computation and parallel workflows.

1. Core Algorithmic Structure

ZFP’s architecture is defined by a sequential blockwise transform, quantization, and embedded coding process. An input array (typically multi-dimensional, e.g., a 3D medical volume) is partitioned into non-overlapping, fixed-size blocks (e.g., 4×4×4 voxels for 3D data). Within each block, ZFP executes the following major pipeline steps:

Block-Floating-Point Quantization: All values within a block are aligned to a common exponent, and mantissas are packed as signed integers of fixed precision $q$ bits, truncating excess bits as needed (Diffenderfer et al., 2018, Fox et al., 2024).
Near-Orthogonal Block Transform: Each block is transformed via a decorrelating transform $L$ , defined in 1D as

$L=\frac{1}{16} \begin{pmatrix} 4 & 4 & 4 & 4 \ 5 & 1 & -1 & -5 \ -4 & 4 & 4 & -4 \ -2 & 6 & -6 & 2 \end{pmatrix},\quad L^{-1} = \frac{1}{4} \begin{pmatrix} 4 & 6 & -4 & -1 \ 4 & 2 & 4 & 5 \ 4 & -2 & 4 & -5 \ 4 & -6 & -4 & 1 \end{pmatrix}$

and extended to higher dimensions by Kronecker products (Diffenderfer et al., 2018, Fox et al., 2024).

Coefficient Quantization and Bit-Plane Encoding: The transformed coefficients are encoded via embedded bit-plane coding. Coefficients are represented in negabinary, transposed by significance, and entropy encoded at the bit-plane level.
Mode-Driven Truncation: Stopping criteria control which bit-planes are emitted:
- Fixed-rate: emit exactly $B$ bits/voxel.
- Fixed-accuracy: emit until the Chebyshev ( $\ell_\infty$ ) norm error is at most $\epsilon$ .
- Fixed-precision: emit $\beta$ most-significant bit-planes. (Diffenderfer et al., 2018, Elbana et al., 4 Oct 2025)
Inverse Decompression: The reverse steps restore the floating-point block, with lossy rounds at quantization and transform reconstruction.

The overall compression-decompression operator for a block $\bm{x}$ is constructed compositionally (Diffenderfer et al., 2018, Fox et al., 2024):

$\tilde D \circ \tilde C(\bm{x}) = F_B S_{-\ell} \tilde L_d^{-1} F_N^{-1} T_P (F_N F_B(\tilde L_d F_B^{-1} S_\ell F_B^{-1}(\bm{x})))$

where $F_B$ and $L$ 0 denote (signed-)binary and negabinary representations, $L$ 1 is the bit-plane truncation, and $L$ 2 is the exponent shift.

2. Compression Modes and Empirical Performance

ZFP’s modes define its operational flexibility:

Fixed-Rate Mode: A user specifies $L$ 3 bits/voxel. Rate is roughly $L$ 4, but realization includes format overhead. Empirical evaluation on 3D cerebrovascular data gives:

| Bits/Voxel | Size (MB) | CR | Mean Dice | |------------|-----------|-------|-----------| | 16 | 3865.01 | 0.995:1 | 0.87738 | | 8 | 1932.51 | 1.99:1 | 0.87730 | | 4 | 966.25 | 3.98:1 | 0.87709 | | 2 | 483.13 | 7.96:1 | 0.87734 |

For fixed-rate compression, segmentation performance on compressed volumes is virtually indistinguishable from the uncompressed baseline (Dice $L$ 5) down to $L$ 6 bits/voxel (Elbana et al., 4 Oct 2025).

Error-Tolerance (Fixed-Accuracy) Mode: The user prescribes a maximum absolute error $L$ 7 ( $L$ 8). Larger $L$ 9 yields higher compression. For cerebrovascular data:

| $L=\frac{1}{16} \begin{pmatrix} 4 & 4 & 4 & 4 \ 5 & 1 & -1 & -5 \ -4 & 4 & 4 & -4 \ -2 & 6 & -6 & 2 \end{pmatrix},\quad L^{-1} = \frac{1}{4} \begin{pmatrix} 4 & 6 & -4 & -1 \ 4 & 2 & 4 & 5 \ 4 & -2 & 4 & -5 \ 4 & -6 & -4 & 1 \end{pmatrix}$ 0 | Size (MB) | CR | Mean Dice | |------------|-----------|--------|-----------| | 500 | 168.05 | 22.89:1| 0.87656 | | 1000 | 110.63 | 34.86:1| 0.87368 | | 1500 | 78.33 | 49.12:1| 0.86060 |

At $L=\frac{1}{16} \begin{pmatrix} 4 & 4 & 4 & 4 \ 5 & 1 & -1 & -5 \ -4 & 4 & 4 & -4 \ -2 & 6 & -6 & 2 \end{pmatrix},\quad L^{-1} = \frac{1}{4} \begin{pmatrix} 4 & 6 & -4 & -1 \ 4 & 2 & 4 & 5 \ 4 & -2 & 4 & -5 \ 4 & -6 & -4 & 1 \end{pmatrix}$ 1, a 22.89:1 compression ratio is achieved with only negligible degradation in segmentation metrics (Elbana et al., 4 Oct 2025).

These trends generalize to large-scale computational science data. For 3D fluid dynamics, ZFP achieves $L=\frac{1}{16} \begin{pmatrix} 4 & 4 & 4 & 4 \ 5 & 1 & -1 & -5 \ -4 & 4 & 4 & -4 \ -2 & 6 & -6 & 2 \end{pmatrix},\quad L^{-1} = \frac{1}{4} \begin{pmatrix} 4 & 6 & -4 & -1 \ 4 & 2 & 4 & 5 \ 4 & -2 & 4 & -5 \ 4 & -6 & -4 & 1 \end{pmatrix}$ 2 at PSNR $L=\frac{1}{16} \begin{pmatrix} 4 & 4 & 4 & 4 \ 5 & 1 & -1 & -5 \ -4 & 4 & 4 & -4 \ -2 & 6 & -6 & 2 \end{pmatrix},\quad L^{-1} = \frac{1}{4} \begin{pmatrix} 4 & 6 & -4 & -1 \ 4 & 2 & 4 & 5 \ 4 & -2 & 4 & -5 \ 4 & -6 & -4 & 1 \end{pmatrix}$ 3 dB, outperforming FPZIP and matching or exceeding wavelet-based and SZ compressors in sparsity-dominated fields (Hadjidoukas et al., 2019).

3. Error Analysis and Statistical Properties

The precision of ZFP is quantified by explicit error bounds across all operational modes:

Forward Error: For a block $L=\frac{1}{16} \begin{pmatrix} 4 & 4 & 4 & 4 \ 5 & 1 & -1 & -5 \ -4 & 4 & 4 & -4 \ -2 & 6 & -6 & 2 \end{pmatrix},\quad L^{-1} = \frac{1}{4} \begin{pmatrix} 4 & 6 & -4 & -1 \ 4 & 2 & 4 & 5 \ 4 & -2 & 4 & -5 \ 4 & -6 & -4 & 1 \end{pmatrix}$ 4, the decompressed error in fixed-precision/fixed-rate mode is bounded by (Diffenderfer et al., 2018, Fox et al., 2020):

with $L=\frac{1}{16} \begin{pmatrix} 4 & 4 & 4 & 4 \ 5 & 1 & -1 & -5 \ -4 & 4 & 4 & -4 \ -2 & 6 & -6 & 2 \end{pmatrix},\quad L^{-1} = \frac{1}{4} \begin{pmatrix} 4 & 6 & -4 & -1 \ 4 & 2 & 4 & 5 \ 4 & -2 & 4 & -5 \ 4 & -6 & -4 & 1 \end{pmatrix}$ 6 a function of block size, dimension, quantizer bits ( $L=\frac{1}{16} \begin{pmatrix} 4 & 4 & 4 & 4 \ 5 & 1 & -1 & -5 \ -4 & 4 & 4 & -4 \ -2 & 6 & -6 & 2 \end{pmatrix},\quad L^{-1} = \frac{1}{4} \begin{pmatrix} 4 & 6 & -4 & -1 \ 4 & 2 & 4 & 5 \ 4 & -2 & 4 & -5 \ 4 & -6 & -4 & 1 \end{pmatrix}$ 7), bit-planes retained ( $L=\frac{1}{16} \begin{pmatrix} 4 & 4 & 4 & 4 \ 5 & 1 & -1 & -5 \ -4 & 4 & 4 & -4 \ -2 & 6 & -6 & 2 \end{pmatrix},\quad L^{-1} = \frac{1}{4} \begin{pmatrix} 4 & 6 & -4 & -1 \ 4 & 2 & 4 & 5 \ 4 & -2 & 4 & -5 \ 4 & -6 & -4 & 1 \end{pmatrix}$ 8), etc., scaling like $L=\frac{1}{16} \begin{pmatrix} 4 & 4 & 4 & 4 \ 5 & 1 & -1 & -5 \ -4 & 4 & 4 & -4 \ -2 & 6 & -6 & 2 \end{pmatrix},\quad L^{-1} = \frac{1}{4} \begin{pmatrix} 4 & 6 & -4 & -1 \ 4 & 2 & 4 & 5 \ 4 & -2 & 4 & -5 \ 4 & -6 & -4 & 1 \end{pmatrix}$ 9.

Error Bias: Compressing via negabinary bit-plane truncation introduces a nonzero expected error (bias), characterized as (Fox et al., 2024):

$B$ 0

Step 8 (bit-plane truncation) dominates this bias, but “round-to-nearest” modifications (pre- or post-compression) eliminate the main contribution, achieving $B$ 1 reduction in mean bias with negligible computational or storage penalty (Fox et al., 2024).

Error Distributions: Stepwise quantization error is uniform within each discarded plane, while subsequent transform mixing yields piecewise cubic, near-Gaussian marginals. Rounding makes these distributions strictly zero-mean and identical across coefficients (Fox et al., 2024).

4. Downstream Stability and Impact on Scientific Workflows

Repeated compression-decompression (as in time-stepping, iterative schemes, or in situ/ex situ workflows) compounds errors, raising questions about stability:

Inline Usage: In iterative solvers, if the base iterative mapping $B$ 2 is Lipschitz with constant $B$ 3, then after $B$ 4 steps, ZFP-induced error is bounded by (Fox et al., 2020):

$B$ 5

where $B$ 6 bounds the iterate norm. Thus, compressed iterations converge to within $B$ 7 of the floating-point fixed point, and the required number of extra iterations is small.

Empirical Verification: In numerical studies (e.g., 1D/2D/3D heat/advection equations, Jacobi Poisson solves), ZFP error always falls within the theoretical bounds and remains well below modeling or floating-point truncation errors for reasonable $B$ 8 (Fox et al., 2020).

5. Parallel Implementation and Scalability

ZFP’s blockwise and entropy coding design scales efficiently in domain-decomposed HPC and workflow frameworks:

CubismZ Integration: ZFP is implemented as a block-based “first-stage” compressor within CubismZ; each MPI rank processes blocks in parallel, with OpenMP-threaded per-block compression and per-rank buffer aggregation. The approach achieves in situ compression ratios up to $B$ 9 for 3D cloud cavitation collapse data on BG/Q, with minimal I/O and time overhead (2% of simulation) (Hadjidoukas et al., 2019).
Throughput and Speed: Typical measured speeds on modern CPUs approach 126 MB/s compress, 507 MB/s decompress (single core), with linearly scaling throughput on up to $\ell_\infty$ 0 cores.
Comparative Advantage: ZFP matches SZ on smooth, high-PSNR data, but outperforms SZ and wavelets for highly sparse fields. On-the-fly online predictors efficiently select between ZFP and SZ per-field to optimize rate-distortion tradeoff, yielding further throughput and efficiency gains in multi-field simulations (Tao et al., 2018).

6. Applications and Best-Practice Recommendations

ZFP is increasingly employed in scientific and clinical domains where large floating-point datasets and tight error control are critical:

Medical Imaging: For 3D cerebrovascular segmentation tasks, error-tolerance (fixed-accuracy) mode with $\ell_\infty$ 1 achieves $\ell_\infty$ 2 with Dice scores $\ell_\infty$ 3, suitable for cohort/distributed research (Elbana et al., 4 Oct 2025).
Large-Scale Simulation: In CFD and similar settings, PSNR and CR can be tuned by adjusting the error bound; PSNR $\ell_\infty$ 4 dB for visualization, $\ell_\infty$ 5 up to $\ell_\infty$ 6 for storage (Hadjidoukas et al., 2019).

Best-practice parameters depend on error tolerance for downstream analysis. Fixed-rate mode is advisable for rigid storage/network budgets; error-tolerance mode is optimal for maximal compression under a quality constraint. Advanced users should consider zero-bias rounding for analytic applications or when long-range correlation properties are sensitive to error bias (Fox et al., 2024).

7. Limitations, Considerations, and Ongoing Developments

Known limitations and current areas of ongoing investigation include:

Generality: Most studies evaluate ZFP on canonical anatomical regions, fields, or architectures; transferability across anatomies or to detection tasks or other data types is unproven (Elbana et al., 4 Oct 2025).
Cross-Codec Comparison: Direct head-to-head benchmarking with learned or wavelet compressors is sparse; the relative efficiency of ZFP outside rate–distortion metrics (e.g., in learned representation tasks) remains an open question (Elbana et al., 4 Oct 2025).
Error Bias: Uncorrected ZFP may introduce error bias at the mean level, but this is now well-understood and can be effectively mitigated using rounding strategies with negligible penalty (Fox et al., 2024).
Performance Overhead: For practical deployment, the one-time compression/decompression cost must be weighed against storage/IO savings. ZFP’s block-parallel design offers high throughput, and bindings such as pyzfp facilitate integration in heterogeneous (CPU/GPU) environments (Elbana et al., 4 Oct 2025).

In summary, ZFP’s block-transform and embedded quantization strategy enables extremely high compression ratios—with tunable control over absolute or relative error and minimal fidelity loss—rendering it a foundational tool for scalable, training-free lossy compression in scientific and medical data domains (Elbana et al., 4 Oct 2025, Diffenderfer et al., 2018, Fox et al., 2024, Hadjidoukas et al., 2019, Tao et al., 2018, Fox et al., 2020).