Dice Question Streamline Icon: https://streamlinehq.com

Feasibility of watermarking DNA language models across the central dogma

Determine whether a watermarking scheme can be designed for DNA language models such that the watermark is detectable across the central dogma—specifically, in both generated DNA sequences and the proteins translated from those sequences.

Information Square Streamline Icon: https://streamlinehq.com

Background

Watermarking methods for LLMs have progressed rapidly, and initial efforts have extended watermarking to protein generative models. Nevertheless, the biological and computational constraints unique to DNA (e.g., a four-letter alphabet, mutation processes, and functional preservation requirements) make the design of watermarking schemes for DNA LLMs particularly challenging.

The central dogma introduces an additional requirement: to ensure traceability not only in DNA outputs but also in the translated proteins, enabling end-to-end provenance across biological information flow. Prior to this work, there had been no established approach demonstrating that such a watermark could be designed and detected in both DNA and protein domains.

References

Recently, Zhang et al. and Chen et al. applied watermarks to protein generative models. However, it is unknown whether a watermark scheme can be designed for DNA LLMs and the central dogma.

Securing the Language of Life: Inheritable Watermarks from DNA Language Models to Proteins (2509.18207 - Zhang et al., 20 Sep 2025) in Section 2.1 (Watermark for Language Models)