Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Achievable Rates for the Shotgun Sequencing Channel with Erasures (2401.16342v3)

Published 29 Jan 2024 in cs.IT and math.IT

Abstract: In shotgun sequencing, the input string (typically, a long DNA sequence composed of nucleotide bases) is sequenced as multiple overlapping fragments of much shorter lengths (called \textit{reads}). Modelling the shotgun sequencing pipeline as a communication channel for DNA data storage, the capacity of this channel was identified in a recent work, assuming that the reads themselves are noiseless substrings of the original sequence. Modern shotgun sequencers however also output quality scores for each base read, indicating the confidence in its identification. Bases with low quality scores can be considered to be erased. Motivated by this, we consider the \textit{shotgun sequencing channel with erasures}, where each symbol in any read can be independently erased with some probability $\delta$. We identify achievable rates for this channel, using a random code construction and a decoder that uses typicality-like arguments to merge the reads.

Summary

We haven't generated a summary for this paper yet.