Papers
Topics
Authors
Recent
2000 character limit reached

Protecting the Future of Information: LOCO Coding With Error Detection for DNA Data Storage

Published 14 Nov 2023 in cs.IT, eess.SP, and math.IT | (2311.08325v2)

Abstract: DNA strands serve as a storage medium for $4$-ary data over the alphabet ${A,T,G,C}$. DNA data storage promises formidable information density, long-term durability, and ease of replicability. However, information in this intriguing storage technology might be corrupted. Experiments have revealed that DNA sequences with long homopolymers and/or with low $GC$-content are notably more subject to errors upon storage. This paper investigates the utilization of the recently-introduced method for designing lexicographically-ordered constrained (LOCO) codes in DNA data storage. This paper introduces DNA LOCO (D-LOCO) codes, over the alphabet ${A,T,G,C}$ with limited runs of identical symbols. These codes come with an encoding-decoding rule we derive, which provides affordable encoding-decoding algorithms. In terms of storage overhead, the proposed encoding-decoding algorithms outperform those in the existing literature. Our algorithms are readily reconfigurable. D-LOCO codes are intrinsically balanced, which allows us to achieve balancing over the entire DNA strand with minimal rate penalty. Moreover, we propose four schemes to bridge consecutive codewords, three of which guarantee single substitution error detection per codeword. We examine the probability of undetecting errors. We also show that D-LOCO codes are capacity-achieving and that they offer remarkably high rates at moderate lengths.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. M. G. Ross, C. Russ, M. Costello, A. Hollinger, N. J. Lennon, R. Hegarty, C. Nusbaum, and D. B. Jaffe , “Characterizing and measuring bias in sequence data,” Genome Biol., vol. 14, no. 5, p. R51, May 2013.
  2. J. J. Schwartz, C. Lee, and J. Shendure, “Accurate gene synthesis with tag-directed retrieval of sequence-verified DNA molecules,” Nat. Methods, vol. 9, no. 9, pp. 913–915, Aug. 2012.
  3. C. E. Shannon, “A mathematical theory of communication,” Bell Sys. Tech. J., vol. 27, Jul. 1948.
  4. D. T. Tang and R. L. Bahl, “Block codes for a class of constrained noiseless channels,” Inf. and Control, vol. 17, no. 5, pp. 436–461, Dec. 1970.
  5. P. A. Franaszek, “Sequence-state methods for run-length-limited coding,” IBM J. Res. Develop., vol. 14, no. 4, pp. 376–383, Jul. 1970.
  6. R. Adler, D. Coppersmith, and M. Hassner, “Algorithms for sliding block codes–An application of symbolic dynamics to information theory,” IEEE Trans. Inf. Theory, vol. 29, no. 1, pp. 5–22, Jan. 1983.
  7. P. Siegel, “Recording codes for digital magnetic storage,” IEEE Trans. Magn., vol. 21, no. 5, pp. 1344–1349, Sep. 1985.
  8. R. Karabed and P. H. Siegel, “Coding for higher-order partial-response channels,” in Proc. SPIE Int. Symp. Voice, Video, and Data Commun., M. R. Raghuveer, S. A. Dianat, S. W. McLaughlin, and M. Hassner, Eds., Philadelphia, PA, Oct. 1995, vol. 2605, pp. 115–126.
  9. A. Sharov and R. M. Roth, “Two-dimensional constrained coding based on tiling,” IEEE Trans. Inf. Theory, vol. 56, no. 4, pp. 1800–1807, Apr. 2010.
  10. A. Kato and K. Zeger, “On the capacity of two-dimensional run-length constrained channels,” IEEE Trans. Inf. Theory, vol. 45, no. 5, pp. 1527–1540, Jul. 1999.
  11. B. Dabak, A. Hareedy, and R. Calderbank, “Non-binary constrained codes for two-dimensional magnetic recording,” IEEE Trans. Magn., vol. 56, no. 11, pp. 1–10, Nov. 2020.
  12. J.-D. Lee, S.-H. Hur, and J.-D. Choi, “Effects of floating-gate interference on NAND flash memory cell operation,” IEEE Electron Device Lett., vol. 23, no. 5, pp. 264–266, May 2002.
  13. V. Taranalli, H. Uchikawa, and P. H. Siegel, “Error analysis and inter-cell interference mitigation in multi-level cell flash memories,” in Proc. IEEE Int. Conf. Commun. (ICC), London, UK, Jun. 2015, pp. 271–276.
  14. A. Hareedy, B. Dabak, and R. Calderbank, “Managing device lifecycle: Reconfigurable constrained codes for M/T/Q/P-LC Flash memories,” IEEE Trans. Inf. Theory, vol. 67, no. 1, pp. 282–295, Jan. 2021.
  15. A. Hareedy, S. Zheng, P. Siegel, and R. Calderbank, “Efficient constrained codes that enable page separation in modern Flash memories,” IEEE Trans. Commun., vol. 71, no. 12, pp. 6834–6848, Dec. 2023.
  16. K. A. S. Immink, P. H. Siegel, and J. K. Wolf, “Codes for digital recorders,” IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 2260–2299, Oct. 1998.
  17. X. Li, S. Zhou, and L. Zou, “Design of DNA storage coding with enhanced constraints,” Entropy, vol. 24, no. 8, pp. 1151, Aug. 2022.
  18. R. Heckel, G. Mikutis, and R. N. Grass, “Characterization of the DNA data storage channel,” Sci. Rep., vol. 9, no. 9663, Jul. 2019, doi.org/10.1038/s41598-019-45832-6.
  19. J. Centers, X. Tan, A. Hareedy, and R. Calderbank, “Power spectra of constrained codes with level-based signaling: Overcoming finite-length challenges,” IEEE Trans. Commun., vol. 69, no. 8, pp. 4971–4986, Aug. 2021.
  20. T. Cover, “Enumerative source encoding,” IEEE Trans. Inf. Theory, vol. 19, no. 1, pp. 73–77, Jan. 1973.
  21. V. Braun and K. A. S. immink, “An enumerative coding technique for DC-free runlength-limited sequences,” IEEE Trans. Commun., vol. 48, no. 12, pp. 2024–2031, Dec. 2000.
  22. J. Gu and T. E. Fuja, “A new approach to constructing optimal block codes for runlength-limited channels,” IEEE Trans. Inf. Theory, vol. 40, no. 3, pp. 774–785, May 1994.
  23. I. F. Blake, “The enumeration of certain run length sequences,” Inf. and Control, vol. 55, no. 1–3, pp. 222–237, Oct.–Dec. 1982.
  24. A. Hareedy, B. Dabak, and R. Calderbank, “The secret arithmetic of patterns: A general method for designing constrained codes based on lexicographic indexing,” IEEE Trans. Inf. Theory, vol. 68, no. 9, pp. 5747–5778, Sep. 2022.
  25. A. Hareedy and R. Calderbank, “Asymmetric LOCO codes: Constrained codes for Flash memories,” Proc. 57th Annu. Allerton Conf. Commun., Control, Comput. (Allerton), Monticello, IL, USA, Sep. 2019, pp. 124–131.
  26. A. Hareedy and R. Calderbank, “LOCO codes: Lexicographically-ordered constrained codes,” IEEE Trans. Inf. Theory, vol. 66, no. 6, pp. 3572–3589, Jun. 2020.
  27. K. A. S. Immink and K. Cai, “Design of capacity-approaching constrained codes for DNA-based storage systems,” IEEE Commun. Lett., vol. 22, no. 2, pp. 224–227, Feb. 2018.
  28. K. A. S. Immink and K. Cai, “Properties and constructions of constrained codes for DNA-based data storage,” IEEE Access, vol. 8, pp. 49523–49531, Mar. 2020.
  29. W. Song, K. Cai, M. Zhang, and C. Yuen, “Codes with run-length and GC-content constraints for DNA-based data storage,” IEEE Commun. Lett., vol. 22, no. 10, pp. 2004–2007, Oct. 2018.
  30. Y. Wang, M. Noor-A-Rahim, E. Gunawan, Y. L. Guan, and C. L. Poh, “Construction of bio-constrained code for DNA data storage,” IEEE Commun. Lett., vol. 23, no. 6, pp. 963–966, Jun. 2019.
  31. T. T. Nguyen, K. Cai, K. A. S. Immink, and H. M. Kiah, “Capacity-approaching constrained codes with error correction for DNA-based data storage,” IEEE Trans. Inf. Theory, vol. 67, no. 8, pp. 5602–5613, Aug. 2021.
  32. T. T. Nguyen, K. Cai, and P. H. Siegel, “Every bit counts: A new version of non-binary VT codes with more efficient encoder,” Proc. IEEE Int. Conf. Commun. (ICC), Rome, Italy, May–Jun. 2023, pp. 5477–5482.
  33. X. Li, M. Chen, and H. Wu, “Multiple errors correction for position-limited DNA sequences with GC balance and no homopolymer for DNA-based data storage,” Brief. Bioinform., vol. 24, no. 1, Jan. 2023.
  34. Z. Yan, C. Liang, and H. Wu, “A Segmented-edit error-correcting code with re-synchronization function for DNA-based storage systems,” IEEE Trans. Emerg. Top. Comput., vol. 11, no. 3, pp. 605–618, Jul.-Sep. 2023.
  35. Z. Yan, G. Qu, and H. Wu, “A novel soft-in soft-out decoding algorithm for VT codes on multiple received DNA strands,” Proc. IEEE Int. Symp. Inf. Theory (ISIT), Taipei, Taiwan, Jun. 2023, pp. 838–843.
  36. S. -J. Park, Y. Lee, and J. -S. No, “Iterative coding scheme satisfying GC balance and run-length constraints for DNA storage with robustness to error propagation,” J. Commun. Netw., vol. 24, no. 3, pp. 283–291, Jun. 2022.
  37. Y. Liu, X. He, and X. Tang, “Capacity-achieving constrained codes with GC-content and runlength limits for DNA storage,” Proc. IEEE Int. Symp. Inf. Theory (ISIT), Espoo, Finland, Jun.–Jul. 2022, pp. 198–203.
  38. L. Organick, S. D. Ang, Y. -J. Chen, R. Lopez, S. Yekhanin, K. Makarychev, M. Z. Racz, G. Kamath, P. Gopalan, B. Nguyen, C. N. Takahashi, S. Newman, H. -Y. Parker, C. Rashtchian, K. Stewart, G. Gupta, R. Carlson, J. Mulligan, D. Carmean, G. Seelig, L. Ceze, and K. Strauss, “Random access in large-scale DNA data storage,” Nat. Biotechnol. vol. 36. no. 3, pp. 242–258, Mar. 2018.
  39. O. Milenkovic, R. Gabrys, H. M. Kiah, and S. M. H. Tabatabaei Yazdi, “Exabytes in a test tube,” IEEE Spectr., vol. 55, no. 5, pp. 40–45, May 2018.
Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.