Prefix-Free Kolmogorov Complexity
- Prefix-free Kolmogorov complexity is defined as the length of the shortest self-delimiting program that produces a string, ensuring unique decodability.
- It uniquely connects algorithmic complexity with probability theory via the Kraft inequality and underpins key concepts in randomness and information theory.
- Its applications include characterizing effective dimensions, informing randomness tests, and optimizing coding theorems in computability theory.
Prefix-free Kolmogorov complexity, typically denoted , refines the classical notion of algorithmic complexity by requiring that valid descriptions (programs) be codewords in a prefix-free set. This self-delimination constraint, originally introduced to ensure unique decodability, yields a complexity measure with deeper connections to probability, information theory, and algorithmic randomness than the plain (standard) Kolmogorov complexity . The prefix-free variant underpins symmetry of information theorems, randomness characterizations, the foundation of algorithmic probability, as well as fine structural distinctions within computability theory and randomness hierarchies.
1. Formal Definition and Prefix-free Codes
Let be a fixed universal prefix-free Turing machine. The prefix-free Kolmogorov complexity of a finite binary string , , is defined as the length of the shortest program such that and the set is prefix-free:
The prefix-free property ensures that no is a proper prefix of another. By the invariance theorem, is well-defined up to an additive constant independent of for optimal universal prefix-free machines (Shen, 2015).
Conditional prefix-free complexity is analogously defined by with again universal and prefix-free in . The prefix-free constraint ensures direct correspondence between prefix complexity and probability weights via the Kraft inequality:
2. Relation to Plain Kolmogorov Complexity
While both (plain Kolmogorov complexity) and are upper semicomputable and satisfy invariance properties, their structure and quantitative behavior differ fundamentally:
- For all , , with since a plain program can be made self-delimiting by prepending a self-delimiting encoding of its length (Bauwens, 2013).
- Solovay’s relations tightly characterize the tradeoff:
where denotes the complexity of the complexity of , and so forth (Bauwens, 2013).
- Gács’ theorem demonstrates that the “complexity-of-complexity” can reach for certain , implying that is not computable and its deviation from can be controlled only to within an iterated logarithmic term (Bauwens et al., 2012).
Infinitely often, there exist strings of length with yet , meaning can systematically fall short of by logarithmic factors (Bauwens, 2013, Bauwens et al., 2012).
3. Information-theoretic Properties and Probability
The prefix-free constraint facilitates a one-to-one correspondence with universal lower semicomputable semimeasures, foundational to algorithmic probability:
- Solomonoff’s universal a priori probability is a lower semicomputable semimeasure with .
- The coding theorem gives:
and for the conditional case (Vitanyi, 2012):
where is defined by effective enumeration and weighting of all lower semicomputable conditional semiprobabilities and not by naive joint/marginalization, which fails to capture strong coding inequalities (Vitanyi, 2012).
This direct identification with probability is unique to prefix-free complexity and is not mirrored for the plain variant.
4. Optimal Domains, Structural Separation, and Deficiency
Not all domains of plain decompressors (decoders) contain the domain of any optimal prefix-free decompressor. There exist optimal plain decompressors such that no subset of their domain serves as the domain of any universal prefix-free decompressor (Andreev et al., 2010). This structural separation strictly exceeds the mere value gap, demonstrating the intrinsic difference in the way self-delimitation constrains description spaces.
Randomness deficiency can be quantified in both settings:
- For of length , plain deficiency , prefix deficiency .
- There exist for which , , and vice versa, cementing the non-equivalence of the classes of “trivial” sequences defined by and by (Bauwens, 2013, Bauwens et al., 2012).
5. Prefix Complexity and Algorithmic Randomness
Prefix-free complexity is the canonical quantitative characterization of algorithmic randomness:
- A sequence is Martin-Löf random if and only if there exists such that , (Shen, 2015).
- K-triviality, the property for all , demarcates the least random sequences, and is sharply separated from the C-triviality class.
Solovay functions, computable upper bounds such that for all and for infinitely many , mediate the relationship between randomness, K-triviality, and the convergence properties of (0902.1041).
6. Effective Dimension, Compression, and Applications
Prefix-free Kolmogorov complexity directly quantifies effective dimension, Hausdorff dimension adapted to algorithmic information: for a sequence ,
(Shen, 2015).
Recent advances leverage layered Kraft-Chaitin constructions to show that any infinite stream can be uniformly coded into a Martin-Löf random stream such that is recoverable from the first bits of , paralleling and strengthening classical source coding theorems in a fully algorithmic setting (Barmpalias et al., 2017). The overhead is optimal in general.
7. Advanced Topics: Relativized Complexity and Randomness Hierarchies
Relativization to oracles and characterization via limsup formulae yield hierarchies of prefix-free complexity degrees. For example, for every ,
with the set of minimal (shortest) descriptions, and this can be bootstrapped into finite definitions of all -randomness classes via prefix-free complexity (Downey et al., 2022).
This reveals that not only is the unique “measure” aligning complexity, probability, and randomness notionally and quantitatively, but its technical apparatus is also indispensable in higher-order randomness and effective descriptive set theory. Iterated applications of prefix-free complexity, in concert with minimal descriptions, realize fine gradations within the arithmetical hierarchy (e.g., -randomness and semi-low sets).
References:
(Bauwens, 2013, Bauwens et al., 2012, Shen, 2015, Andreev et al., 2010, 0902.1041, Barmpalias et al., 2017, Vitanyi, 2012, Downey et al., 2022)