Single and multiple consecutive permutation motif search (1301.4952v2)
Abstract: Let $t$ be a permutation (that shall play the role of the {\em text}) on $[n]$ and a pattern $p$ be a sequence of $m$ distinct integer(s) of $[n]$, $m\leq n$. The pattern $p$ occurs in $t$ in position $i$ if and only if $p_1... p_m$ is order-isomorphic to $t_i... t_{i+m-1}$, that is, for all $1 \leq k< \ell \leq m$, $p_k>p_\ell$ if and only if $t_{i+k-1}>t_{i+\ell-1}$. Searching for a pattern $p$ in a text $t$ consists in identifying all occurrences of $p$ in $t$. We first present a forward automaton which allows us to search for $p$ in $t$ in $O(m2\log \log m +n)$ time. We then introduce a Morris-Pratt automaton representation of the forward automaton which allows us to reduce this complexity to $O(m\log \log m +n)$ at the price of an additional amortized constant term by integer of the text. Both automata occupy $O(m)$ space. We then extend the problem to search for a set of patterns and exhibit a specific Aho-Corasick like algorithm. Next we present a sub-linear average case search algorithm running in $O(\frac{m\log m}{\log\log m}+\frac{n\log m}{m\log\log m})$ time, that we eventually prove to be optimal on average.