Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generalised Pattern Matching Revisited (2001.05976v1)

Published 16 Jan 2020 in cs.DS

Abstract: In the problem of $\texttt{Generalised Pattern Matching}\ (\texttt{GPM})$ [STOC'94, Muthukrishnan and Palem], we are given a text $T$ of length $n$ over an alphabet $\Sigma_T$, a pattern $P$ of length $m$ over an alphabet $\Sigma_P$, and a matching relationship $\subseteq \Sigma_T \times \Sigma_P$, and must return all substrings of $T$ that match $P$ (reporting) or the number of mismatches between each substring of $T$ of length $m$ and $P$ (counting). In this work, we improve over all previously known algorithms for this problem for various parameters describing the input instance: * $\mathcal{D}\,$ being the maximum number of characters that match a fixed character, * $\mathcal{S}\,$ being the number of pairs of matching characters, * $\mathcal{I}\,$ being the total number of disjoint intervals of characters that match the $m$ characters of the pattern $P$. At the heart of our new deterministic upper bounds for $\mathcal{D}\,$ and $\mathcal{S}\,$ lies a faster construction of superimposed codes, which solves an open problem posed in [FOCS'97, Indyk] and can be of independent interest. To conclude, we demonstrate first lower bounds for $\texttt{GPM}$. We start by showing that any deterministic or Monte Carlo algorithm for $\texttt{GPM}$ must use $\Omega(\mathcal{S})$ time, and then proceed to show higher lower bounds for combinatorial algorithms. These bounds show that our algorithms are almost optimal, unless a radically new approach is developed.

Summary

We haven't generated a summary for this paper yet.