Internal logic or syntax in biological “text” (DNA sequences)

Determine whether DNA sequences, interpreted as a biological text formed by the four-letter nucleotide alphabet {A, C, G, T}, follow an internal logic or syntactic structure.

Background

The paper frames DNA as a generative system analogous to language, where nucleotides A, C, G, and T act as letters that compose biological words, phrases, and sentences. Motivated by connections between biological and linguistic generativity, the authors raise the question of whether DNA exhibits an internal logic or syntax akin to formal languages.

To operationalize a mathematical perspective on DNA sequences, the authors employ Gödel numbering to encode DNA strands into unique integers and analyze the statistical distribution of their logarithms under randomness assumptions. They compare theoretical predictions with simulated and real-world sequences to probe deviations indicative of non-random dynamics and information processing. Within this broader context, the explicit open question concerns the existence of an underlying syntactic structure in biological texts (DNA).

References

It is an open and a highly interesting question if the biological "text" follows an internal logic, or a syntax.

DNA coding and Gödel numbering (1909.13574 - Nicolaidis et al., 2019) in Introduction