Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bounds and Constructions of $\ell$-Read Codes under the Hamming Metric (2403.11754v1)

Published 18 Mar 2024 in cs.IT and math.IT

Abstract: Nanopore sequencing is a promising technology for DNA sequencing. In this paper, we investigate a specific model of the nanopore sequencer, which takes a $q$-ary sequence of length $n$ as input and outputs a vector of length $n+\ell-1$ referred to as an $\ell$-read vector where the $i$-th entry is a multi-set composed of the $\ell$ elements located between the $(i-\ell+1)$-th and $i$-th positions of the input sequence. Considering the presence of substitution errors in the output vector, we study $\ell$-read codes under the Hamming metric. An $\ell$-read $(n,d)_q$-code is a set of $q$-ary sequences of length $n$ in which the Hamming distance between $\ell$-read vectors of any two distinct sequences is at least $d$. We first improve the result of Banerjee \emph{et al.}, who studied $\ell$-read $(n,d)_q$-codes with the constraint $\ell\geq 3$ and $d=3$. Then, we investigate the bounds and constructions of $2$-read codes with a minimum distance of $3$, $4$, and $5$, respectively. Our results indicate that when $d \in {3,4}$, the optimal redundancy of $2$-read $(n,d)_q$-codes is $o(\log_q n)$, while for $d=5$ it is $\log_q n+o(\log_q n)$. Additionally, we establish an equivalence between $2$-read $(n,3)_q$-codes and classical $q$-ary single-insertion reconstruction codes using two noisy reads. We improve the lower bound on the redundancy of classical $q$-ary single-insertion reconstruction codes as well as the upper bound on the redundancy of classical $q$-ary single-deletion reconstruction codes when using two noisy reads. Finally, we study $\ell$-read codes under the reconstruction model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. A. Banerjee, Y. Yehezkeally, A. Wachter-Zeh, and E. Yaakobi, “Error Correcting Codes for Nanopore Sequencing,” arXiv:2305.10214, 2024.
  2. A. Banerjee, Y. Yehezkeally, A. Wachter-Zeh, and E. Yaakobi, “Correcting a Single Deletion in Reads from a Nanopore Sequencer,” arXiv:2401.15939, 2024.
  3. K. Cai, H. M. Kiah, T. T. Nguyen, and E. Yaakobi, “Coding for sequence reconstruction for single edits,” IEEE Trans. Inf. Theory, vol. 68, no. 1, pp. 66-79, 2022.
  4. J. Chrisnata, H. M. Kiah, and E. Yaakobi, “Correcting deletions with multiple reads,” IEEE Trans. Inf. Theory, vol. 68, no. 11, pp. 7141-7158, 2022.
  5. Y. M. Chee, A. Vardy, V. K. Vu, and E. Yaakobi, “Transverse-Read-Codes for Domain Wall Memories,” IEEE J. Sel. Areas Inf. Theory, vol. 4, pp. 784-793, 2023.
  6. D. Deamer, M. Akeson, and D. Branton, “Three decades of nanopore sequencing,” Nature biotechnology, vol. 34, no. 5, pp. 518-524, 2016.
  7. DNA Data Storage Alliance, “Preserving our digital legacy: an introduction to DNA data storage,” 2021.
  8. R. Hulett, S. Chandak, and M. Wootters, “On coding for an abstracted nanopore channel for dna storage,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Melbourne, Australia, pp. 2465-2470, 2021.
  9. D. E. Knuth, “The sandwich theorem,” Elec. J. Comb., vol. 1, no. 1, p. A1, Apr. 1994.
  10. J. J. Kasianowicz, E. Brandin, D. Branton, and D. W. Deamer, “Characterization of individual polynucleotide molecules using a membrane channel,” Proceedings of the National Academy of Sciences, vol. 93, no. 24, pp. 13 770-13 773, 1996.
  11. V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,” Soviet Phys. Dokl., vol. 10, no. 8, pp. 707-710, 1966.
  12. V. I. Levenshtein, “Efficient reconstruction of sequences,” IEEE Trans. Inf. Theory, vol. 47, no. 1, pp. 2-22, 2001.
  13. S. Liu and C. Xing, “Nonlinear codes with low redundancy,” arXiv:2310.14219, 2023.
  14. W. Mao, S. N. Diggavi, and S. Kannan, “Models and information theoretic bounds for nanopore sequencing,” IEEE Trans. Inf. Theory, vol. 64, no. 4, pp. 3216-3236, 2018
  15. B. McBain, E. Viterbo, and J. Saunderson, “Finite-State Semi-Markov Channels for Nanopore Sequencing,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Espoo, Finland, pp. 216-221, 2022.
  16. B. McBain, E. Viterbo, and J. Saunderson, “Homophonic Coding for the Noisy Nanopore Channel with Constrained Markov Sources,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Taipei, Taiwan, pp. 376-381, 2023.
  17. J. Rydning, “Worldwide idc global datasphere forecast, 2022-2026: Enterprise organizations driving most of the data growth,” tech. rep., Technical Report, 2022.
  18. Y. Sun and G. Ge, “Correcting two-deletion with a constant number of reads,” IEEE Trans. Inf. Theory, vol. 69, no. 5, pp. 2969-2982, 2023.
  19. Y. Sun, Y. Xi, and G. Ge, “Sequence reconstruction under single-burst-insertion/deletion/edit channel,” IEEE Trans. Inf. Theory, vol. 69, no. 7, pp. 4466-4483, 2023.
  20. A. Vidal, V. B. Wijekoon, and E. Viterbo, “Error Bounds for Decoding Piecewise Constant Nanopore Signals in DNA Storage,” in Proc IEEE Int. Con. Commu. (ICC), Rome, Italy, pp. 4452-4457, 2023.
  21. A. Vidal, V. B. Wijekoon, and E. Viterbo, “Union Bound for Generalized Duplication Channels with DTW Decoding,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Taipei, Taiwan, pp. 358-363, 2023.
  22. O. Yerushalmi, T. Etzion, and E. Yaakobi, “The Capacity of the Weighted Read Channel,” arXiv:2401.15368, 2024.

Summary

We haven't generated a summary for this paper yet.