Coding for Synthesis Defects (2405.02080v1)
Abstract: Motivated by DNA based data storage system, we investigate the errors that occur when synthesizing DNA strands in parallel, where each strand is appended one nucleotide at a time by the machine according to a template supersequence. If there is a cycle such that the machine fails, then the strands meant to be appended at this cycle will not be appended, and we refer to this as a synthesis defect. In this paper, we present two families of codes correcting synthesis defects, which are t-known-synthesis-defect correcting codes and t-synthesis-defect correcting codes. For the first one, it is assumed that the defective cycles are known, and each of the codeword is a quaternary sequence. We provide constructions for this family of codes for t = 1, 2, with redundancy log 4 and log n+18 log 3, respectively. For the second one, the codeword is a set of M ordered sequences, and we give constructions for t = 1, 2 to show a strategy for constructing this family of codes. Finally, we derive a lower bound on the redundancy for single-known-synthesis-defect correcting codes, which assures that our construction is almost optimal.
- S. H. T. Yazdi, H. M. Kiah, E. Garcia-Ruiz, J. Ma, H. Zhao, and O. Milenkovic, “Dna-based storage: Trends and methods,” IEEE Transactions on Molecular, Biological and Multi-Scale Communications, vol. 1, no. 3, pp. 230–248, 2015.
- I. Shomorony and R. Heckel, “Information-theoretic foundations of dna data storage,” Foundations and Trends® in Communications and Information Theory, vol. 19, no. 1, pp. 1–106, 2022.
- M. Yu, X. Tang, Z. Li, W. Wang, S. Wang, M. Li, Q. Yu, S. Xie, X. Zuo, and C. Chen, “High-throughput dna synthesis for data storage,” Chemical Society Reviews, 2024.
- L. Ceze, J. Nivala, and K. Strauss, “Molecular digital data storage using dna,” Nature Reviews Genetics, vol. 20, no. 8, pp. 456–466, 2019.
- A. Lenz, Y. Liu, C. Rashtchian, P. H. Siegel, A. Wachter-Zeh, and E. Yaakobi, “Coding for efficient dna synthesis,” in 2020 IEEE International Symposium on Information Theory (ISIT). IEEE, 2020, pp. 2885–2890.
- A. Lenz, S. Melczer, C. Rashtchian, and P. H. Siegel, “Multivariate analytic combinatorics for cost constrained channels and subsequence enumeration,” arXiv preprint arXiv:2111.06105, 2021.
- K. Makarychev, M. Z. Rácz, C. Rashtchian, and S. Yekhanin, “Batch optimization for dna synthesis,” IEEE Transactions on Information Theory, vol. 68, no. 11, pp. 7454–7470, 2022.
- O. Elishco and W. Huleihel, “Optimal reference for dna synthesis,” IEEE Transactions on Information Theory, 2023.
- M. Abu-Sini, A. Lenz, and E. Yaakobi, “Dna synthesis using shortmers,” in 2023 IEEE International Symposium on Information Theory (ISIT). IEEE, 2023, pp. 585–590.
- J. Chrisnata and H. M. Kiah, “Deletion correcting codes for efficient dna synthesis,” in 2023 IEEE International Symposium on Information Theory (ISIT). IEEE, 2023, pp. 352–357.
- J. Lietard, A. Leger, Y. Erlich, N. Sadowski, W. Timp, and M. M. Somoza, “Chemical and photochemical error rates in light-directed synthesis of complex dna libraries,” Nucleic acids research, vol. 49, no. 12, pp. 6687–6701, 2021.
- C. Schoeny, A. Wachter-Zeh, R. Gabrys, and E. Yaakobi, “Codes correcting a burst of deletions or insertions,” IEEE Transactions on Information Theory, vol. 63, no. 4, pp. 1971–1985, 2017.
- G. Tenengolts, “Nonbinary codes, correcting single deletion or insertion (corresp.),” IEEE Transactions on Information Theory, vol. 30, no. 5, pp. 766–769, 1984.
- V. Guruswami and J. Håstad, “Explicit two-deletion codes with redundancy matching the existential bound,” IEEE Transactions on Information Theory, vol. 67, no. 10, pp. 6384–6394, 2021.
- R. R. Varshamov and G. Tenenholtz, “A code for correcting a single asymmetric error,” Automatica i Telemekhanika, vol. 26, no. 2, pp. 288–292, 1965.
- S. Liu, I. Tjuawinata, and C. Xing, “Explicit construction of q-ary 2-deletion correcting codes with low redundancy,” IEEE Transactions on Information Theory, 2024.
- J. Chrisnata, H. M. Kiah, and E. Yaakobi, “Correcting deletions with multiple reads,” IEEE Transactions on Information Theory, vol. 68, no. 11, pp. 7141–7158, 2022.
- D. E. Knuth, “The sandwich theorem,” The Electronic Journal of Combinatorics, vol. 1, no. 1, 1994.
- J. Sima, N. Raviv, and J. Bruck, “Two deletion correcting codes from indicator vectors,” IEEE Transactions on Information Theory, vol. 66, no. 4, pp. 2375–2391, 2019.