Approximate Trace Reconstruction via Median String (in Average-Case) (2107.09497v1)
Abstract: We consider an \emph{approximate} version of the trace reconstruction problem, where the goal is to recover an unknown string $s\in{0,1}n$ from $m$ traces (each trace is generated independently by passing $s$ through a probabilistic insertion-deletion channel with rate $p$). We present a deterministic near-linear time algorithm for the average-case model, where $s$ is random, that uses only \emph{three} traces. It runs in near-linear time $\tilde O(n)$ and with high probability reports a string within edit distance $O(\epsilon p n)$ from $s$ for $\epsilon=\tilde O(p)$, which significantly improves over the straightforward bound of $O(pn)$. Technically, our algorithm computes a $(1+\epsilon)$-approximate median of the three input traces. To prove its correctness, our probabilistic analysis shows that an approximate median is indeed close to the unknown $s$. To achieve a near-linear time bound, we have to bypass the well-known dynamic programming algorithm that computes an optimal median in time $O(n3)$.