Calculating Kolmogorov Complexity from the Output Frequency Distributions of Small Turing Machines (1211.1302v2)

Published 6 Nov 2012 in cs.IT, cs.CC, math.IT, and nlin.PS

Abstract: Drawing on various notions from theoretical computer science, we present a novel numerical approach, motivated by the notion of algorithmic probability, to the problem of approximating the Kolmogorov-Chaitin complexity of short strings. The method is an alternative to the traditional lossless compression algorithms, which it may complement, the two being serviceable for different string lengths. We provide a thorough analysis for all $\sum_{n=1}^{11} 2^n$ binary strings of length $n<12$ and for most strings of length $12\leq n \leq16$ by running all $\sim 2.5 \times 10^{13}$ Turing machines with 5 states and 2 symbols ($8\times 22^9$ with reduction techniques) using the most standard formalism of Turing machines, used in for example the Busy Beaver problem. We address the question of stability and error estimation, the sensitivity of the continued application of the method for wider coverage and better accuracy, and provide statistical evidence suggesting robustness. As with compression algorithms, this work promises to deliver a range of applications, and to provide insight into the question of complexity calculation of finite (and short) strings.

Citations (155)

View on Semantic Scholar

Summary

The paper presents a novel method using output frequency distributions from 5-state Turing machines to approximate Kolmogorov complexity for short strings.
It leverages algorithmic probability and Busy Beaver values to set informed runtime limits, circumventing the halting problem for practical evaluation.
The approach shows potential for interdisciplinary applications, including psychometrics and economic time series, by linking theoretical and empirical complexity.

Analyzing Kolmogorov Complexity through Small Turing Machines' Output Frequency Distribution

The paper under review provides a rigorous examination of calculating Kolmogorov complexity, particularly for short strings, by employing small Turing machines' output frequency distributions. This investigation explores a methodology distinct from traditional lossless compression algorithms, addressing a significant challenge in the algorithmic information theory (AIT) domain: approximating the uncomputable measure of string complexity defined originally by Kolmogorov and Chaitin.

Methodological Framework

The authors present a detailed approach to approximating Kolmogorov complexity by leveraging the output of Turing machines configured with five states and two symbols. The inherent uncomputability of the Kolmogorov complexity is approached via algorithmic probability, specifically utilizing Levin's semi-measure—an application of Solomonoff's universal induction. Central to their methodology is the use of known Busy Beaver values, providing an informed runtime limit for these small machines, circumventing Turing's halting problem constraint to a feasible extent.

The evaluation involves calculating the output frequency distributions from all 5-state Turing machines run under a theoretically informed step limit, harnessing a diverse set of techniques to avoid unnecessary executions of known non-halting or trivially halting configurations. These include symmetry exploitations, cycle detections, and escape detection. This approach allows the authors to not only address the statistical stability and error estimation of the complexity evaluations but also facilitate practical coverage of computable segments of algorithmic probability distributions, thereby approximating Kolmogorov complexity.

Key Findings

The paper offers compelling insights into the applicability of algorithmic probability to estimating the Kolmogorov complexity for short strings. The findings illustrate that from 5-state Turing machines, a considerable number of binary sequences can be produced with reliably estimated complexity values. Approximately 99608 distinct binary sequences were generated, with string lengths ranging from 1 to 49 bits. Within this range, the output frequency distribution supports the approximation of complexity for strings length of up to 15, highlighting some of their theoretical and computational underpinnings.

Moreover, the correlations between shorter and longer Turing machine configurations through the invariance of the distributions confirm the robustness of the authors' approach. Such correlations across D(4) and D(5) distributions are notable for providing a finer granularity in classifying complexity, especially evident in their ability to re-order some rankings originally less distinguishable in previous models.

Theoretical and Practical Implications

The implications of this research extend into various realms, including psychometrics, graph theory, and the analysis of cellular automata. The methodology introduced holds potential for applications beyond theoretical computer science—as demonstrated by its use in economic time series analysis and psychometric evaluations. Critically, the approach shows promise in aligning theoretical abstractions of algorithmic complexity with practical applications, thus broadening the empirical basis for algorithmic randomness.

Future Developments

While the findings are robust within the bounds of a five-state configuration, the expansion to higher-state configurations remains constrained by computational capacities. Future explorations may benefit from scaling computational resources or introducing alternative models for universal Turing machines to broaden insightful estimations of complexity. Additionally, investigating other computational models within the presented framework could provide cross-validation and enhance the understanding of the theoretical limits and capabilities of Kolmogorov complexity approximations.

In summary, the paper sets a solid groundwork in extending the practical applicability of algorithmic probability to complexities of short strings via innovative computational simulations, offering a complementary approach to the traditionally compression-based method. This innovative methodology advances the understanding of fundamental issues in AIT and opens pathways for further interdisciplinary applications.

PDF Markdown