Block Decomposition Method (BDM)
- BDM is a method that quantifies the algorithmic complexity of multidimensional objects by breaking them into smaller CTM-derived blocks.
- It aggregates local complexity estimates with a logarithmic correction for repeated patterns, bridging universal algorithmic and statistical entropy measures.
- BDM is applicable to diverse data types such as strings, matrices, graphs, and cellular automata, enhancing discrimination of deep algorithmic regularities.
The Block Decomposition Method (BDM) is a quantitative technique for estimating the algorithmic complexity of large, multidimensional objects by leveraging local approximations derived from the Coding Theorem Method (CTM), itself grounded in Solomonoff-Levin algorithmic probability and Kolmogorov complexity. BDM is designed to interpolate between true universal algorithmic complexity and Shannon entropy, providing resolutions that are sensitive to algorithmic regularities beyond statistical redundancy. By decomposing data into smaller components for which algorithmic complexity can be estimated or tabulated, and correcting for repetitions in a principled way, BDM constitutes an extensible framework applicable to objects such as long strings, matrices, graphs, and higher-dimensional tensors (Zenil et al., 2016, Zenil et al., 2012).
1. Theoretical Foundations: Algorithmic Probability and CTM
BDM is constructed on the basis of algorithmic probability and the application of the Coding Theorem. For a prefix-free universal Turing machine , the algorithmic probability of a string is
where is the length of program . The Coding Theorem asserts
providing a bridge between the frequency with which a pattern can be generated and its Kolmogorov complexity .
The Coding Theorem Method (CTM) estimates by exhaustively enumerating small Turing machines, observing the distribution of their outputs, and applying the coding theorem to these empirical frequencies. For practical reasons, CTM is exact only for very short strings or objects (Gauvrit et al., 2014, Zenil et al., 2012).
2. Construction and Formal Definition of BDM
BDM adapts CTM to larger and higher-dimensional objects by assembling local CTM-based complexity estimates. The general formula for a string decomposed into substrings of length is:
where are (possibly overlapping) blocks, and denotes the multiplicity of distinct substring . For a -dimensional object , decomposed into base blocks with adjacency multiset ,
The term efficiently encodes repetitions, reflecting that a repetition adds only logarithmic cost relative to the block’s first description (Zenil et al., 2016, Zenil et al., 2012).
BDM thus yields an upper bound on . If CTM is accurate on each block, and decomposition does not destroy global regularity, as block size grows. When CTM is unavailable, BDM degenerates to blockwise Shannon entropy, making the method minimax-optimal between algorithmic and statistical compression.
3. Boundary Conditions, Variants, and Error Bounds
Multiple partitioning and boundary strategies permit flexibility for various data types:
- Trim (non-overlapping): Ignores leftover segments, underestimating complexity.
- Full overlapping: Sliding windows generate all possible blocks, typically overestimating.
- Cyclic (toroidal): Treats the object as periodic, avoiding remainders.
- Add-col: Pads with low-complexity blocks to fit block size, subtracting the padding from the final complexity.
Error bounds for BDM are provided as follows:
where counts possible re-assemblies and is the cumulative local CTM error. In the limiting case of fine-grained decomposition and accurate block complexities, BDM converges to . If regularity cannot be captured at the local block level, BDM approaches block Shannon entropy :
4. Implementation and Computational Aspects
CTM precomputes all possible outputs for small Turing machines, typically up to length 11–12 for binary strings or up to for arrays (Zenil et al., 2016, Zenil et al., 2012). BDM leverages these tables for subblocks. The primary trade-offs are:
- Speed: Non-overlapping partitioning achieves (strings) or (arrays) complexity; full overlap or recursive tallies scale polynomially.
- Memory: Storage of CTM lookup tables for all blocks is substantial but manageable for strings or arrays up to moderate sizes.
- Software: Implementations are available in Wolfram Language, R (acss), Python, C++, Haskell, Perl, Matlab, and via online calculators.
A normalized form (NBDM) linearly rescales BDM scores so that $0$ matches repetition of the least complex block, and $1$ the maximally complex arrangement, thus enabling comparison across sizes (Zenil et al., 2016).
5. Empirical Performance and Illustrative Examples
BDM excels where classical entropy and compression-based methods fail to discriminate algorithmic regularities:
- Mathematical constants: The digits of exhibit maximal entropy, but CTM — and by extension BDM — recognize algorithmic regularity due to the availability of short generating programs. Thue-Morse sequences manifest high entropy yet receive significantly lower BDM complexity, as CTM identifies their generative structure (Zenil et al., 2016).
- Random-looking strings: Many strings of length 12 with maximal entropy are assigned low CTM values due to discoverable generating programs. BDM extends this discrimination to longer strings by incorporating local CTM results and penalizing only for originality in block occurrence.
- Graphs: For isomorphic, dual, or cospectral graphs, BDM correctly reflects the underlying algorithmic equivalence, yielding complexity correlation coefficients far exceeding block entropy's performance (Zenil et al., 2016).
- Compression comparison: For large random strings, BDM distinguishes strings of varying algorithmic regularity, while entropy and compression assign all near-maximal complexity.
6. Applications to Multidimensional and Complex Structures
The BDM framework generalizes seamlessly to arrays and graphs:
- Two-dimensional arrays/tensors: CTM-based block tables extend to -dimensional patches, making BDM directly applicable to images, spatial signals, and adjacency matrices (Zenil et al., 2012, Zenil et al., 2016).
- Graphs and networks: Applying BDM to adjacency matrices enables algorithmic-complexity-based classification and analysis of graphs, respecting algorithmic symmetries missed by entropy-rate or compression-based heuristics.
- Cellular automata: BDM, especially with variable block sizes, reproduces and refines complexity-based classification of space-time diagrams of cellular automaton rules, with high correlation to intuition and compression measures (Zenil et al., 2012).
7. Significance, Limitations, and Future Directions
BDM bridges algorithmic information theory and practical complexity estimation for arbitrary finite objects. Its principal strengths include invariance under relabelings, the detection of deep algorithmic regularities, and the circumvention of the overestimation problems common to compression for short or structured data. BDM's limitations are linked to the accuracy and coverage of CTM block tables (currently limited to bits), computational tractability, and weaker discrimination power as block size increases beyond available CTM data—in which regime it converges toward entropy. There is scope for improvements such as expanding CTM tabulations, optimizing block decompositions, exploring dynamic programming partitioning, and refining statistical correction terms (Zenil et al., 2016, Zenil et al., 2012).
In summary, BDM operationalizes algorithmic complexity estimation for general, multidimensional, and structured data contexts by systematically patching together local CTM-based complexities and penalizing for redundancies, offering a principled method that interpolates between true algorithmic and entropic complexity (Zenil et al., 2016, Zenil et al., 2012).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free