Formal linguistic properties of DNA sequences

Determine whether genomic DNA sequences possess formal linguistic properties as defined in formal linguistics, beyond analogical comparisons to natural language, and specify which such properties can be rigorously demonstrated in DNA sequences.

Background

The paper explores parallels between DNA and human language, motivated by longstanding analogies such as describing DNA as the "language of life" and observations like Zipf's law appearing in noncoding DNA. The authors map DNA sequences into a Chinese linguistic feature space and provide evidence for redundancy in fixed-length DNA segments, suggesting an overlap with core linguistic features.

Despite these results, the broader question of whether DNA possesses formal linguistic properties in a rigorous sense remains unsettled. The authors explicitly flag this as unresolved at the outset, distinguishing their empirical findings on redundancy from a complete formal linguistic characterization.

References

DNA has long been described as the “language of life”, but whether it possesses formal linguistic properties remains unresolved.

DNA and Human Language: Epigenetic Memory and Redundancy in Linear Sequence (2503.23494 - Yang et al., 30 Mar 2025) in Abstract