Recursive Compression: Hierarchical Data Reduction

Updated 24 June 2026

Recursive compression is a hierarchical technique that recursively partitions and approximates data, revealing multi-scale redundancies.
It employs methods like interpolative decomposition, QR factorization, and randomized sampling to enhance storage efficiency and computational speed.
Applications span numerical linear algebra, image/video encoding, graph processing, and deep learning, offering scalable and adaptive performance.

Recursive compression is a foundational paradigm in data representation and computational mathematics, in which a complex structure such as a dataset, matrix, or signal is hierarchically partitioned or modeled, and each component is recursively approximated, encoded, or factored in a way that exposes and exploits redundancy across scales or substructures. Recursive compression strategies underpin algorithms for efficient data storage, accelerated numerical linear algebra, model checking, integer set and graph encoding, image and video transmission, and even conceptual models of intelligence through their connection to algorithmic probability and Kolmogorov complexity.

1. Core Principles and Formal Definitions

Recursive compression refers broadly to schemes that decompose a target object (vector, matrix, set, or sequence) hierarchically, with each component compressed or modeled using the same or similar principles, often yielding asymptotically optimal or near-optimal rates. This can manifest as recursive partitioning (as in image or set compression), recursive factorization (as in hierarchical matrices), recursive coding (in bits-back and ANS frameworks), or recursive model abstraction (algorithmic probability).

The recursive aspect encodes global information by iteratively summarizing or transforming local substructures, enabling efficient algorithms and leveraging data redundancy at multiple scales. In the strongest sense (algorithmic information theory), compression is equivalent to constructing a minimal, executable generative model for the data, recursively applied to all model components (Hernández-Espinosa et al., 20 Mar 2025, Franz, 2015).

2. Recursive Compression in Numerical Linear Algebra

Hierarchical matrix algorithms for discretized PDEs and integral equations represent a major area of application. Central instances include:

Strong Recursive Skeletonization (RS-S, RS-WS): Hierarchically partitions the degrees of freedom (DOFs) of an $N \times N$ $N \times N$ dense matrix (arising from, e.g., integral-equation discretization). On each level of an octree or quadtree, strong skeletonization compresses only the far-field interactions via interpolative decomposition (ID). This is done recursively level by level, producing a factorization into block unit-triangular matrices, dramatically reducing rank growth and storage (Minden et al., 2016). The RS-WS variant alternates weak and strong skeletonization, halving storage while retaining near-linear time and memory.
- Factorization has $O(N)$ time and memory cost under suitable rank assumptions.
- Empirical results show these methods scale linearly up to tens of millions of DOFs, with factor times and storage competitive with or superior to previous methods (e.g., HIF), and extreme effectiveness as preconditioners for iterative solvers.
Randomized Strong Recursive Skeletonization (RSRS): Extends to black-box $\mathcal{H}^2$ -matrices using only matrix-vector products. Randomized Gaussian test matrices are used to recursively sample, compress, and block-eliminate submatrices, producing an invertible LU-like factorization and requiring a number of mat-vecs independent of $N$ (Yesypenko et al., 2023).
QR-Recursive Compression in Electromagnetic Scattering: For matrix-valued volume integral equations modeling large metasurfaces, recursion is accomplished via block-partitioned QR on far blocks at each level of a spatial hierarchy. Only near blocks are kept dense. This scheme yields massive speedup (mat-vec acceleration of up to $\sim$ 100 $\times$ and compressing memory by up to $100\times$ ), and when paired with a block-diagonal preconditioner, yields GMRES iteration reductions of two orders of magnitude (Mottola et al., 11 Mar 2026).

3. Recursive Compression in Data Structures and Algorithms

Recursive State Compression via Balanced Tree Hashing: In model checking, fixed-length state vectors are recursively split and encoded as balanced binary tree nodes, each representing sub-vectors with maximal sharing across states. Each unique subtree is referenced once, yielding theoretical compressed sizes close to 2/k units per state (with k the vector length), and practical compression ratios of 5–10× over classical process-table (“COLLAPSE”) compression. The tree-based compression achieves near-zero overhead at multicore scale due to minimal synchronization requirements (Laarman et al., 2011).
Recursive Subset Size Encoding (RSSS) for Set Compression: For unordered sets from a fixed universe, RSSS recursively builds a binary decomposition tree, encoding at each node the subset size in the left child. With proper statistical modeling, this approach asymptotically matches the ideal code length of $-\log_2 P(S)$ (with $P(S)$ from empirical statistics or model fits). RSSS outperforms gap and range-narrowing codes when element probabilities are highly nonuniform, and adapts efficiently as statistics are updated or enumeration is permuted (Larsson, 2014).
PivotCompress—Recursive Sorting and Entropy Coding: Recursively records the decisions made by quicksort (left/right partition bitvectors) and then compresses the sorted sequence and its symbol counts recursively. This achieves universal coding within $O(\log N)$ bits of the entropy for any stationary source, and can beat the naive sample-entropy bound on highly nonuniform finite sequences via binomial-coefficient enumeration of permutation decisions (Stiffelman, 2014).
Recursive Graph Bisection for Graph/Index Compression: Graph/inverted-index vertices are recursively bisected to minimize the sum of $O(N)$ 0-gap-encoded adjacency lists using bi/multivariate layout objectives. Each bisection improves locality, which directly reduces code-length via delta encoding of neighbor gaps. The bisection recursion is inherently parallelizable and has yielded 20–50% bit-rate improvements on Web and social graphs with billions of nodes (Dhulipala et al., 2016).

4. Recursive Compression in Structured Data and Deep Learning

Adaptive Recursive Partitioning for Multidimensional Images (CARP): Uses a recursive dyadic partition tree on image pixels, Bayesian inference for optimal partition/pruning decisions, and a subsequent 1D transform on the permuted vector. Recursion is driven by local stationarity in the signal and governed by a single global noise-variance parameter controlling the compression/fidelity trade-off. CARP outperforms standard codecs such as JPEG, JPEG2000, HEVC, and neural baselines by substantial margins in MS-SSIM and PSNR across images and videos. The same principles extend to higher dimensions and irregular domains (Liu et al., 2019).
Deep Recursive Octree Compression for 3D Voxels (RocNet): Hierarchically partitions 3D volume data into an octree, applying a leaf/node encoder recursively to homogeneous and mixed blocks. The final root features are mapped to a small latent vector. Recursion enables both parameter and computational scaling as $O(N)$ 1, provides sparsity-aware skipping of empty/full blocks, and yields high compression ratios (e.g., $O(N)$ 2 floats, 4096:1) with improved or competitive accuracy and resource use versus prior CNN-based architectures (Liu et al., 2020).

5. Recursive Coding and Hierarchical Probabilistic Models

Recursive Bits-Back Coding (Bit-Swap): Hierarchical latent variable models induce a recursive coding scheme: bits for each latent variable and the data are coded in interleaved decode/encode steps, at each recursion layer transmitting only the required overhead for that layer. Bit-Swap strictly reduces initial overhead compared to naive bits-back coding in deep hierarchical models, amortizing to the negative ELBO per datum, and outperforming standard and previous bits-back codecs on natural images (Kingma et al., 2019).

6. Recursive Compression and Algorithmic Probability

The notion of recursive compression finds its fullest expression in algorithmic information theory, where the Kolmogorov complexity $O(N)$ 3 of a string $O(N)$ 4 is the length of its shortest program under a universal Turing machine, and the algorithmic probability $O(N)$ 5 is the cumulative weight of all programs that generate $O(N)$ 6, with $O(N)$ 7. Recursive compression here entails finding not just a program for $O(N)$ 8, but recursively compressing the model/parameters themselves—yielding the minimal, most abstract causal generator for the data. This principle underlies the SuperARC test for artificial general intelligence, which operationalizes intelligence evaluation as the ability to recursively synthesize and compress mechanisms explaining observed data, tightly linking compression ability to predictive and deductive power (Hernández-Espinosa et al., 20 Mar 2025, Franz, 2015).

Key theoretical results include:

The equivalence of compression and predictive capability: models that yield lower codelengths also yield superior predictions, and vice versa, as formalized by the coding theorems and martingale characterizations of randomness.
Real-world systems such as current LLMs often default to shallow statistical compression, unable to recursively model deeper causal/algorithmic patterns, as empirical binary-sequence prediction experiments demonstrate.

7. Commonalities, Architectures, and Performance

Methodologically, recursive compression architectures are characterized by:

Hierarchical partitioning (e.g., trees, octrees, spatial bisection) to localize redundancy.
Per-level or per-node compression/encoding (e.g., ID, QR, probabilistic pruning, statistical modeling).
Recursion depth and structure determined by natural data or domain geometry (e.g., depth $O(N)$ 9 for spatial data).
Performance: Linear or near-linear scaling in both time and storage in favorable settings (e.g., elliptic PDEs, sparse sets, images with strong locality), with empirically observed speedups and compression gains of 5–100 $\mathcal{H}^2$ 0 compared with shallow, non-recursive or flat baselines across a range of domains (Minden et al., 2016, Yesypenko et al., 2023, Mottola et al., 11 Mar 2026, Laarman et al., 2011, Kingma et al., 2019, Liu et al., 2019, Liu et al., 2020).
Robustness: Many schemes exhibit modularity, adaptivity, and cross-domain extensibility—recursive coding can be tuned for data-dependent trade-offs, extended to higher dimensions, or tailored using statistical or model-based priors.

In summary, recursive compression unifies diverse algorithmic strategies for hierarchy-exposing, redundancy-exploiting data reduction and calculation, and connects deeply to foundational limits in computability and prediction (Minden et al., 2016, Laarman et al., 2011, Liu et al., 2019, Kingma et al., 2019, Larsson, 2014, Liu et al., 2020, Hernández-Espinosa et al., 20 Mar 2025, Stiffelman, 2014, Dhulipala et al., 2016, Franz, 2015, Yesypenko et al., 2023, Mottola et al., 11 Mar 2026). As datasets and computational problems grow ever larger and more structured, recursive strategies form the backbone of scalable, principled, and theoretically grounded compression and abstraction.