Bidirectional Information Distance

Updated 13 November 2025

Bidirectional Information Distance is a measure based on Kolmogorov complexity that quantifies the minimal program length required to transform one object into another.
It leverages a duality between conditional complexities, ensuring symmetric conversions and precise handling of plain and prefix variants.
The concept underpins practical applications in clustering, bioinformatics, and web mining, with extensions including normalized measures like NID and approximations via compressors.

Bidirectional information distance is a central concept in algorithmic information theory, quantifying the minimal information required to transform one object into another and vice versa using the framework of Kolmogorov complexity. It underpins universal metrics for similarity, pattern recognition, learning, and data mining, and admits precise mathematical characterizations relating to shortest programs, metric properties, and extensions to multisets. The focus is primarily on the plain and prefix variants, their exactness, normalization schemes, and applications—alongside notable subtleties regarding prefix distances and the regime where constant-precision equivalence holds.

1. Formal Definitions and Fundamental Properties

Let $U$ be a fixed reference optimal universal Turing machine, and consider two finite binary strings $x, y \in \{0,1\}^*$ . The bidirectional information distance $D(x, y)$ is defined as the minimal length of a binary program $p$ such that $U(p, x) = y$ and $U(p, y) = x$ .

$D(x, y) = \min \{\, |p| : U(p, x) = y \;\;\mathrm{and}\;\; U(p, y) = x \,\}$

This notion extends Kolmogorov complexity to pairs, with $C(x)$ denoting the plain Kolmogorov complexity (length of the shortest program that outputs $x$ with no auxiliary input), and $C(x \mid y)$ the conditional complexity (shortest program that maps $y$ to $x$ ).

A crucial result is the tight (up to an additive constant) duality between $D(x, y)$ and conditional complexities:

$D(x, y) = \max \left\{ C(x \mid y),\, C(y \mid x) \right\} + O(1)$

This symmetry reflects the fact that the minimal program must encode conversions in both directions efficiently. The result generalizes to multisets: For a multiset $X$ of cardinality $n \geq 2$ , with $k = \max\{ C(X \mid (x, n)) : x \in X \}$ ,

$\mathrm{ID}(X) \leq k + \log n + O(1)$

and, for appropriately chosen $X$ , this bound is tight up to the $O(1)$ term (Vitanyi, 2014).

2. Exactness, Upper and Lower Bounds, and Prefix Variants

The lower bound $D(x, y) \geq \max\{ C(x \mid y), C(y \mid x) \}$ follows directly: any program witnessing $D(x, y)$ must be able to reconstruct either input from the other, so it cannot be shorter than either conditional complexity.

To achieve the upper bound, construct two shortest programs $p_1$ (of length $C(x \mid y)$ , mapping $y$ to $x$ ) and $p_2$ (of length $C(y \mid x)$ , mapping $x$ to $y$ ). The program $p'$ of length $k + O(1)$ (with $k = \max\{ C(x \mid y), C(y \mid x) \}$ ) incorporates a direction flag and padding, so that $U(p', x) = y$ and $U(p', y) = x$ (Vitanyi, 2014).

For prefix complexity variants (where programs are self-delimiting), several formulations exist, depending on prefix-free or prefix-stable machines, and whether one uses bipartite (two machines) or non-bipartite cases. The characterization holds up to $O(\log n)$ for prefix distance:

$E_{\mathrm{prefix}}(x, y) = \max\{ K(x \mid y), K(y \mid x) \} + O(\log n)$

where $K(\cdot \mid \cdot)$ denotes prefix-free conditional complexity. Bauwens (Bauwens, 2020, Bauwens et al., 2018) proved that exact $O(1)$ equivalence fails for prefix distances in the regime of small complexities, with the gap growing as large as $\log\log n - O(\log\log\log n)$ , but as soon as $\max\{K(x \mid y), K(y \mid x)\} \geq c_0 \log n$ for some absolute constant $c_0 > 0$ , the $O(1)$ -precision holds for all prefix versions (Bauwens, 2020).

3. Metric Properties and Universality

Bidirectional information distance is symmetric, $D(x, y) = D(y, x) + O(1)$ , immediate from the max-formula and the invariance of the universal machine up to $O(1)$ bits.

Identity holds: $D(x, x) = O(1)$ . The triangle inequality is satisfied up to a logarithmic or constant term:

$D(x, z) \leq D(x, y) + D(y, z) + O(\log \max\{ D(x, y), D(y, z) \})$

With further technical refinements and in the super-logarithmic regime, the constant slack is attainable (Bauwens, 2020).

Vitányi, Li, and collaborators proved a universality property: for any admissible (upper-semicomputable, symmetric, normalized) distance $D(x, y)$ , there is a $c$ such that

$D(x, y) \geq E(x, y) - c$

where $E$ is the bidirectional information distance. Thus, $E$ minorizes all effective similarities (Bennett et al., 2010).

4. Normalization and Practical Approximations

Normalization enables meaningful comparison across pairs of strings of different complexities. The Normalized Information Distance (NID) is defined as

$\mathrm{NID}(x, y) = \frac{D(x, y)}{\max\{ C(x), C(y) \}} = \frac{\max\{ C(x \mid y), C(y \mid x) \}}{\max\{ C(x), C(y) \}}$

This scales the distance to $[0,1]$ (up to negligible terms), and preserves approximate metricity.

Since Kolmogorov complexity $C(\cdot)$ and $K(\cdot)$ are noncomputable, practical approximations rely on compressors $Z$ :

$\mathrm{NCD}(x, y) = \frac{Z(xy) - \min\{ Z(x), Z(y) \}}{\max\{ Z(x), Z(y) \}}$

With suitable compressors (satisfying "normality" conditions), NCD empirically mirrors NID and thus $D(x, y)$ in applications including clustering, bioinformatics, and linguistic analysis (Vitanyi, 2012).

A web-based analog, the Normalized Web Distance (NWD), replaces compressor outputs with code-lengths based on search engine hit counts, enabling semantic similarity estimation.

5. Extensions to Sets and Multiples

Bidirectional information distance generalizes to multisets $S$ with $|S| = s$ elements. Define

$\mathcal{D}(S) = \min\left\{ |p| : \forall w \in S,\; U(p, w) = S \right\}$

For sufficiently large conditional prefix complexities, the exact formula holds (up to an $O(\log s)$ term):

$\mathcal{D}(S) = \max_{w \in S} K(S \mid w) + O(\log s)$

provided $\max_{w \in S} K(S \mid w) \gg s \log(ns)$ . This is established via combinatorial arguments on hypergraphs and extensions of Cantor-space games (Bauwens, 2020).

For lists $X = (x_1, \ldots, x_m)$ , define

$E_{\max}(X) = \min\{\, |p| : \forall i, j,\; U(x_i, p, i) = x_j \,\}$

and up to logarithmic terms,

$E_{\max}(X) = \max_{x\in X} K(X \mid x)$

thus generalizing the pairwise distance (Vitanyi, 2012).

6. Applications and Operational Significance

Bidirectional information distance underlies universal similarity metrics and objective pattern recognition. Example applications include:

Hierarchical clustering: NID and NCD matrices support quartet-based or neighbor-joining algorithms, as in phylogenetic tree reconstruction from DNA sequences (Vitanyi, 2012).
Bioinformatics: Compression-based distances applied to genetic data yield phylogenies congruent with established biological knowledge.
Learning and data mining: NCD distances enable unsupervised classification where shared algorithmic regularity is the only guiding principle.
Web mining: NWD harnesses page-count statistics for semantic comparison of terms.
Question-answering systems: Bidirectional program overlap scoring has yielded substantial empirical performance gains.

Bidirectionality assures the metric captures the least computable effort required in either direction, thus accurately reflecting worst-case similarity and facilitating robust clustering.

7. Open Problems and Developments

Key developments include:

Characterizing the limits of precise equivalence between shortest program length and max conditional complexity: O(1)-precision is unattainable for prefix variants in the low-complexity regime, but holds super-logarithmically (Bauwens, 2020, Bauwens et al., 2018).
Extending metricity to multisets: Sharp additive constants are known, but the trade-offs for intricate distributions and nonuniform cardinalities demand further investigation.
Non-approximability of NID: NID is not semicomputable, nor within computable distance of any computable function (Vitanyi, 2012).
Thermodynamic interpretations: Connections between algorithmic irreversibility and physical work are formalized, linking information distance to the entropy-change in reversible computation (Bennett et al., 2010).

Theoretical refinements and operational methodologies are needed for further scaling and implementing information distance-based techniques in large-scale, high-dimensional data domains. Practical computation defaults to compression-based proxies, but understanding the gap between these proxies and the algorithmic ideal remains a vital area for research.

PDF Markdown Chat (Pro)

References (5)

Exact Expression For Information Distance (2014)

Precise Expression for the Algorithmic Information Distance (2020)

Information Distance Revisited (2018)

Information Distance (2010)

Information Distance: New Developments (2012)

Follow Topic

Get notified by email when new papers are published related to Bidirectional Information Distance.