Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improved Algorithms for Population Recovery from the Deletion Channel (2004.06828v2)

Published 14 Apr 2020 in cs.DS

Abstract: The population recovery problem asks one to recover an unknown distribution over $n$-bit strings given access to independent noisy samples of strings drawn from the distribution. Recently, Ban et al. [BCF+19] studied the problem where the noise is induced through the deletion channel. This problem generalizes the famous trace reconstruction problem, where one wishes to learn a single string under the deletion channel. Ban et al. showed how to learn $\ell$-sparse distributions over strings using $\exp\big(n{1/2} \cdot (\log n){O(\ell)}\big)$ samples. In this work, we learn the distribution using only $\exp\big(\tilde{O}(n{1/3}) \cdot \ell2\big)$ samples, by developing a higher-moment analog of the algorithms of [DOS17, NP17], which solve trace reconstruction in $\exp\big(\tilde{O}(n{1/3})\big)$ samples. We also give the first algorithm with a runtime subexponential in $n$, solving population recovery in $\exp\big(\tilde{O}(n{1/3}) \cdot \ell3\big)$ samples and time. Notably, our dependence on $n$ nearly matches the upper bound of [DOS17, NP17] when $\ell = O(1)$, and we reduce the dependence on $\ell$ from doubly to singly exponential. Therefore, we are able to learn large mixtures of strings: while Ban et al.'s algorithm can only learn a mixture of $O(\log n/\log \log n)$ strings with a subexponential number of samples, we are able to learn a mixture of $n{o(1)}$ strings in $\exp\big(n{1/3 + o(1)}\big)$ samples and time.

Citations (12)

Summary

We haven't generated a summary for this paper yet.