PalFM-index: FM-index for Palindrome Pattern Matching (2206.12600v2)
Abstract: The palindrome pattern matching (pal-matching) is a kind of generalized pattern matching, in which two strings $x$ and $y$ of same length are considered to match (pal-match) if they have the same palindromic structures, i.e., for any possible $1 \le i < j \le |x| = |y|$, $x[i..j]$ is a palindrome if and only if $y[i..j]$ is a palindrome. The pal-matching problem is the problem of searching for, in a text, the occurrences of the substrings that pal-match with a pattern. Given a text $T$ of length $n$ over an alphabet of size $\sigma$, an index for pal-matching is to support, given a pattern $P$ of length $m$, the counting queries that compute the number $\mathsf{occ}$ of occurrences of $P$ and the locating queries that compute the occurrences of $P$. The authors in~[I et al., Theor. Comput. Sci., 2013] proposed an $O(n \lg n)$-bit data structure to support the counting queries in $O(m \lg \sigma)$ time and the locating queries in $O(m \lg \sigma + \mathsf{occ})$ time. In this paper, we propose an FM-index type index for the pal-matching problem, which we call the PalFM-index, that occupies $2n \lg \min(\sigma, \lg n) + 2n + o(n)$ bits of space and supports the counting queries in $O(m)$ time. The PalFM-indexes can support the locating queries in $O(m + \Delta \mathsf{occ})$ time by adding $\frac{n}{\Delta} \lg n + n + o(n)$ bits of space, where $\Delta$ is a parameter chosen from ${1, 2, \dots, n}$ in the preprocessing phase.