Papers
Topics
Authors
Recent
2000 character limit reached

DNA and Human Language: Epigenetic Memory and Redundancy in Linear Sequence (2503.23494v2)

Published 30 Mar 2025 in q-bio.OT

Abstract: DNA is often described as the 'language of life', but whether it possesses formal linguistic properties remains unresolved. Here, we present the first empirical evidence that DNA sequences exhibit core linguistic features, specifically, functional and information redundancy, through comprehensive analysis of genomic and epigenetic datasets. By mapping DNA sequences into a linguistic feature space, we demonstrate that fixed-length (41 bp) DNA segments encode information analogously to human language, with redundancy contributing to signal stability in aqueous intracellular environments. Moreover, we provide the first evidence of one-dimensional epigenetic memory, showing that linear DNA sequences can maintain epigenetic marks such as 6mA methylation, contrasting with models focusing on epigenetic memory transmission via 3D chromatin organization[1]. Our tailored linguistic mapping strategy also addresses persistent challenges in genomic data processing, significantly improving data cleaning and feature extraction. Together, these findings establish a conceptual paradigm that bridges molecular information encoding and linguistic theory, laying the foundation for next-generation LLMs specifically tailored to DNA, marking a shift at the interface of molecular biology, information theory, and AI.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.