Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Small space and streaming pattern matching with k edits (2106.06037v1)

Published 10 Jun 2021 in cs.DS

Abstract: In this work, we revisit the fundamental and well-studied problem of approximate pattern matching under edit distance. Given an integer $k$, a pattern $P$ of length $m$, and a text $T$ of length $n \ge m$, the task is to find substrings of $T$ that are within edit distance $k$ from $P$. Our main result is a streaming algorithm that solves the problem in $\tilde{O}(k5)$ space and $\tilde{O}(k8)$ amortised time per character of the text, providing answers correct with high probability. (Hereafter, $\tilde{O}(\cdot)$ hides a $\mathrm{poly}(\log n)$ factor.) This answers a decade-old question: since the discovery of a $\mathrm{poly}(k\log n)$-space streaming algorithm for pattern matching under Hamming distance by Porat and Porat [FOCS 2009], the existence of an analogous result for edit distance remained open. Up to this work, no $\mathrm{poly}(k\log n)$-space algorithm was known even in the simpler semi-streaming model, where $T$ comes as a stream but $P$ is available for read-only access. In this model, we give a deterministic algorithm that achieves slightly better complexity. In order to develop the fully streaming algorithm, we introduce a new edit distance sketch parametrised by integers $n\ge k$. For any string of length at most $n$, the sketch is of size $\tilde{O}(k2)$ and it can be computed with an $\tilde{O}(k2)$-space streaming algorithm. Given the sketches of two strings, in $\tilde{O}(k3)$ time we can compute their edit distance or certify that it is larger than $k$. This result improves upon $\tilde{O}(k8)$-size sketches of Belazzougui and Zhu [FOCS 2016] and very recent $\tilde{O}(k3)$-size sketches of Jin, Nelson, and Wu [STACS 2021].

Citations (14)

Summary

We haven't generated a summary for this paper yet.