Permutation Recovery Problem against Deletion Errors for DNA Data Storage (2403.15827v1)
Abstract: Owing to its immense storage density and durability, DNA has emerged as a promising storage medium. However, due to technological constraints, data can only be written onto many short DNA molecules called data blocks that are stored in an unordered way. To handle the unordered nature of DNA data storage systems, a unique address is typically prepended to each data block to form a DNA strand. However, DNA storage systems are prone to errors and generate multiple noisy copies of each strand called DNA reads. Thus, we study the permutation recovery problem against deletions errors for DNA data storage. The permutation recovery problem for DNA data storage requires one to reconstruct the addresses or in other words to uniquely identify the noisy reads. By successfully reconstructing the addresses, one can essentially determine the correct order of the data blocks, effectively solving the clustering problem. We first show that we can almost surely identify all the noisy reads under certain mild assumptions. We then propose a permutation recovery procedure and analyze its complexity.
- S. Singhvi, A. Boruchovsky, H. M. Kiah and E. Yaakobi, “Data-Driven Bee Identification for DNA Strands,” 2023 IEEE International Symposium on Information Theory (ISIT), Taipei, Taiwan, 2023, pp. 797-802.
- J. Chrisnata, H. M. Kiah, A. Vardy and E. Yaakobi, “Bee Identification Problem for DNA Strands,” IEEE Journal on Selected Areas in Information Theory, vol. 4, pp. 190-204, 2023.
- G. M. Church, Y. Gao, and S. Kosuri. “Next-generation digital information storage in DNA,” Science, vol. 337, no. 6102, pp. 1628–1628, 2012.
- N. Goldman, P. Bertone, S. Chen, C. Dessimoz, E. M. LeProust, B. Sipos, and E. Birney. “Towards practical, high-capacity, low-maintenance information storage in synthesized DNA,” Nature, vol. 494, no. 7435, pp. 77–80, 2013.
- H. M. Kiah, A. Vardy, and H. Yao, “Efficient bee identification,” IEEE International Symposium on Information Theory (ISIT), pp. 1943–1948, July, 2021.
- H. M. Kiah, A. Vardy, and H. Yao, “Efficient algorithms for the bee-identification problem,” arXiv preprint arXiv:2212.09952, 2022.
- A. Lenz, P. H. Siegel, A. Wachter-Zeh and E. Yaakobi, “Coding over sets for DNA storage,” IEEE Transactions on Information Theory, vol. 66, no. 4, pp. 2331–2351, April 2020.
- L. Organick, S. Ang, Y.J. Chen, R. Lopez, S.Yekhanin, K. Makarychev, M. Racz, G. Kamath, P. Gopalan, B. Nguyen, C. Takahashi, S. Newman, H. Y. Parker, C. Rashtchian, K. Stewart, G. Gupta, R. Carlson, J. Mulligan, D. Carmean, G. Seelig, L. Ceze, and K. Strauss, “Random access in largescale DNA data storage,” Nature Biotechnology, vol. 36, no. 3, pp 242–248, 2018.
- C. Rashtchian, K. Makarychev, M. Racz, S. Ang, D. Jevdjic, S. Yekhanin, L. Ceze, and K. Strauss, “Clustering billions of reads for DNA data storage,” Advances in Neural Information Processing Systems, vol. 30, 2017.
- I. Shomorony, and R. Heckel, “Information-theoretic foundations of DNA data storage,” Foundations and Trends®in Communications and Information Theory, 19(1), 1–106, 2022
- A. Tandon , V.Y.F. Tan, and L.R. Varshney, “The bee-identification problem: Bounds on the error exponent,” IEEE Transactions on Communications, vol. 67, issue no.11, pp. 7405–7416, November, 2019.
- S. Yazdi, H. M. Kiah, E. R. Garcia, J. Ma, H. Zhao, and O. Milenkovic, “DNA-based storage: Trends and methods,” IEEE Trans. Molecular, Biological, Multi-Scale Commun., vol. 1, no. 3, pp. 230–248, 2015.
- J. Edmonds and R. M. Karp, “Theoretical improvements in algorithmic efficiency for network flow problems,” J. ACM, vol. 19, no. 2, pp. 248-264, 1972.
- N. Tomizawa,“On some techniques useful for solution of transportation network problems,” Networks, vol. 1, no. 2, pp. 173-194, 1971.