Bounds and Constructions of $\ell$-Read Codes under the Hamming Metric (2403.11754v1)
Abstract: Nanopore sequencing is a promising technology for DNA sequencing. In this paper, we investigate a specific model of the nanopore sequencer, which takes a $q$-ary sequence of length $n$ as input and outputs a vector of length $n+\ell-1$ referred to as an $\ell$-read vector where the $i$-th entry is a multi-set composed of the $\ell$ elements located between the $(i-\ell+1)$-th and $i$-th positions of the input sequence. Considering the presence of substitution errors in the output vector, we study $\ell$-read codes under the Hamming metric. An $\ell$-read $(n,d)_q$-code is a set of $q$-ary sequences of length $n$ in which the Hamming distance between $\ell$-read vectors of any two distinct sequences is at least $d$. We first improve the result of Banerjee \emph{et al.}, who studied $\ell$-read $(n,d)_q$-codes with the constraint $\ell\geq 3$ and $d=3$. Then, we investigate the bounds and constructions of $2$-read codes with a minimum distance of $3$, $4$, and $5$, respectively. Our results indicate that when $d \in {3,4}$, the optimal redundancy of $2$-read $(n,d)_q$-codes is $o(\log_q n)$, while for $d=5$ it is $\log_q n+o(\log_q n)$. Additionally, we establish an equivalence between $2$-read $(n,3)_q$-codes and classical $q$-ary single-insertion reconstruction codes using two noisy reads. We improve the lower bound on the redundancy of classical $q$-ary single-insertion reconstruction codes as well as the upper bound on the redundancy of classical $q$-ary single-deletion reconstruction codes when using two noisy reads. Finally, we study $\ell$-read codes under the reconstruction model.
- A. Banerjee, Y. Yehezkeally, A. Wachter-Zeh, and E. Yaakobi, “Error Correcting Codes for Nanopore Sequencing,” arXiv:2305.10214, 2024.
- A. Banerjee, Y. Yehezkeally, A. Wachter-Zeh, and E. Yaakobi, “Correcting a Single Deletion in Reads from a Nanopore Sequencer,” arXiv:2401.15939, 2024.
- K. Cai, H. M. Kiah, T. T. Nguyen, and E. Yaakobi, “Coding for sequence reconstruction for single edits,” IEEE Trans. Inf. Theory, vol. 68, no. 1, pp. 66-79, 2022.
- J. Chrisnata, H. M. Kiah, and E. Yaakobi, “Correcting deletions with multiple reads,” IEEE Trans. Inf. Theory, vol. 68, no. 11, pp. 7141-7158, 2022.
- Y. M. Chee, A. Vardy, V. K. Vu, and E. Yaakobi, “Transverse-Read-Codes for Domain Wall Memories,” IEEE J. Sel. Areas Inf. Theory, vol. 4, pp. 784-793, 2023.
- D. Deamer, M. Akeson, and D. Branton, “Three decades of nanopore sequencing,” Nature biotechnology, vol. 34, no. 5, pp. 518-524, 2016.
- DNA Data Storage Alliance, “Preserving our digital legacy: an introduction to DNA data storage,” 2021.
- R. Hulett, S. Chandak, and M. Wootters, “On coding for an abstracted nanopore channel for dna storage,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Melbourne, Australia, pp. 2465-2470, 2021.
- D. E. Knuth, “The sandwich theorem,” Elec. J. Comb., vol. 1, no. 1, p. A1, Apr. 1994.
- J. J. Kasianowicz, E. Brandin, D. Branton, and D. W. Deamer, “Characterization of individual polynucleotide molecules using a membrane channel,” Proceedings of the National Academy of Sciences, vol. 93, no. 24, pp. 13 770-13 773, 1996.
- V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,” Soviet Phys. Dokl., vol. 10, no. 8, pp. 707-710, 1966.
- V. I. Levenshtein, “Efficient reconstruction of sequences,” IEEE Trans. Inf. Theory, vol. 47, no. 1, pp. 2-22, 2001.
- S. Liu and C. Xing, “Nonlinear codes with low redundancy,” arXiv:2310.14219, 2023.
- W. Mao, S. N. Diggavi, and S. Kannan, “Models and information theoretic bounds for nanopore sequencing,” IEEE Trans. Inf. Theory, vol. 64, no. 4, pp. 3216-3236, 2018
- B. McBain, E. Viterbo, and J. Saunderson, “Finite-State Semi-Markov Channels for Nanopore Sequencing,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Espoo, Finland, pp. 216-221, 2022.
- B. McBain, E. Viterbo, and J. Saunderson, “Homophonic Coding for the Noisy Nanopore Channel with Constrained Markov Sources,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Taipei, Taiwan, pp. 376-381, 2023.
- J. Rydning, “Worldwide idc global datasphere forecast, 2022-2026: Enterprise organizations driving most of the data growth,” tech. rep., Technical Report, 2022.
- Y. Sun and G. Ge, “Correcting two-deletion with a constant number of reads,” IEEE Trans. Inf. Theory, vol. 69, no. 5, pp. 2969-2982, 2023.
- Y. Sun, Y. Xi, and G. Ge, “Sequence reconstruction under single-burst-insertion/deletion/edit channel,” IEEE Trans. Inf. Theory, vol. 69, no. 7, pp. 4466-4483, 2023.
- A. Vidal, V. B. Wijekoon, and E. Viterbo, “Error Bounds for Decoding Piecewise Constant Nanopore Signals in DNA Storage,” in Proc IEEE Int. Con. Commu. (ICC), Rome, Italy, pp. 4452-4457, 2023.
- A. Vidal, V. B. Wijekoon, and E. Viterbo, “Union Bound for Generalized Duplication Channels with DTW Decoding,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Taipei, Taiwan, pp. 358-363, 2023.
- O. Yerushalmi, T. Etzion, and E. Yaakobi, “The Capacity of the Weighted Read Channel,” arXiv:2401.15368, 2024.