Bidirectional Information Distance
- Bidirectional Information Distance is a measure based on Kolmogorov complexity that quantifies the minimal program length required to transform one object into another.
- It leverages a duality between conditional complexities, ensuring symmetric conversions and precise handling of plain and prefix variants.
- The concept underpins practical applications in clustering, bioinformatics, and web mining, with extensions including normalized measures like NID and approximations via compressors.
Bidirectional information distance is a central concept in algorithmic information theory, quantifying the minimal information required to transform one object into another and vice versa using the framework of Kolmogorov complexity. It underpins universal metrics for similarity, pattern recognition, learning, and data mining, and admits precise mathematical characterizations relating to shortest programs, metric properties, and extensions to multisets. The focus is primarily on the plain and prefix variants, their exactness, normalization schemes, and applications—alongside notable subtleties regarding prefix distances and the regime where constant-precision equivalence holds.
1. Formal Definitions and Fundamental Properties
Let be a fixed reference optimal universal Turing machine, and consider two finite binary strings . The bidirectional information distance is defined as the minimal length of a binary program such that and .
This notion extends Kolmogorov complexity to pairs, with denoting the plain Kolmogorov complexity (length of the shortest program that outputs with no auxiliary input), and the conditional complexity (shortest program that maps to ).
A crucial result is the tight (up to an additive constant) duality between and conditional complexities:
This symmetry reflects the fact that the minimal program must encode conversions in both directions efficiently. The result generalizes to multisets: For a multiset of cardinality , with ,
and, for appropriately chosen , this bound is tight up to the term (Vitanyi, 2014).
2. Exactness, Upper and Lower Bounds, and Prefix Variants
The lower bound follows directly: any program witnessing must be able to reconstruct either input from the other, so it cannot be shorter than either conditional complexity.
To achieve the upper bound, construct two shortest programs (of length , mapping to ) and (of length , mapping to ). The program of length (with ) incorporates a direction flag and padding, so that and (Vitanyi, 2014).
For prefix complexity variants (where programs are self-delimiting), several formulations exist, depending on prefix-free or prefix-stable machines, and whether one uses bipartite (two machines) or non-bipartite cases. The characterization holds up to for prefix distance:
where denotes prefix-free conditional complexity. Bauwens (Bauwens, 2020, Bauwens et al., 2018) proved that exact equivalence fails for prefix distances in the regime of small complexities, with the gap growing as large as , but as soon as for some absolute constant , the -precision holds for all prefix versions (Bauwens, 2020).
3. Metric Properties and Universality
Bidirectional information distance is symmetric, , immediate from the max-formula and the invariance of the universal machine up to bits.
Identity holds: . The triangle inequality is satisfied up to a logarithmic or constant term:
With further technical refinements and in the super-logarithmic regime, the constant slack is attainable (Bauwens, 2020).
Vitányi, Li, and collaborators proved a universality property: for any admissible (upper-semicomputable, symmetric, normalized) distance , there is a such that
where is the bidirectional information distance. Thus, minorizes all effective similarities (Bennett et al., 2010).
4. Normalization and Practical Approximations
Normalization enables meaningful comparison across pairs of strings of different complexities. The Normalized Information Distance (NID) is defined as
This scales the distance to (up to negligible terms), and preserves approximate metricity.
Since Kolmogorov complexity and are noncomputable, practical approximations rely on compressors :
With suitable compressors (satisfying "normality" conditions), NCD empirically mirrors NID and thus in applications including clustering, bioinformatics, and linguistic analysis (Vitanyi, 2012).
A web-based analog, the Normalized Web Distance (NWD), replaces compressor outputs with code-lengths based on search engine hit counts, enabling semantic similarity estimation.
5. Extensions to Sets and Multiples
Bidirectional information distance generalizes to multisets with elements. Define
For sufficiently large conditional prefix complexities, the exact formula holds (up to an term):
provided . This is established via combinatorial arguments on hypergraphs and extensions of Cantor-space games (Bauwens, 2020).
For lists , define
and up to logarithmic terms,
thus generalizing the pairwise distance (Vitanyi, 2012).
6. Applications and Operational Significance
Bidirectional information distance underlies universal similarity metrics and objective pattern recognition. Example applications include:
- Hierarchical clustering: NID and NCD matrices support quartet-based or neighbor-joining algorithms, as in phylogenetic tree reconstruction from DNA sequences (Vitanyi, 2012).
- Bioinformatics: Compression-based distances applied to genetic data yield phylogenies congruent with established biological knowledge.
- Learning and data mining: NCD distances enable unsupervised classification where shared algorithmic regularity is the only guiding principle.
- Web mining: NWD harnesses page-count statistics for semantic comparison of terms.
- Question-answering systems: Bidirectional program overlap scoring has yielded substantial empirical performance gains.
Bidirectionality assures the metric captures the least computable effort required in either direction, thus accurately reflecting worst-case similarity and facilitating robust clustering.
7. Open Problems and Developments
Key developments include:
- Characterizing the limits of precise equivalence between shortest program length and max conditional complexity: O(1)-precision is unattainable for prefix variants in the low-complexity regime, but holds super-logarithmically (Bauwens, 2020, Bauwens et al., 2018).
- Extending metricity to multisets: Sharp additive constants are known, but the trade-offs for intricate distributions and nonuniform cardinalities demand further investigation.
- Non-approximability of NID: NID is not semicomputable, nor within computable distance of any computable function (Vitanyi, 2012).
- Thermodynamic interpretations: Connections between algorithmic irreversibility and physical work are formalized, linking information distance to the entropy-change in reversible computation (Bennett et al., 2010).
Theoretical refinements and operational methodologies are needed for further scaling and implementing information distance-based techniques in large-scale, high-dimensional data domains. Practical computation defaults to compression-based proxies, but understanding the gap between these proxies and the algorithmic ideal remains a vital area for research.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free