Space-Efficient String Indexing for Wildcard Pattern Matching (1401.0625v1)
Abstract: In this paper we describe compressed indexes that support pattern matching queries for strings with wildcards. For a constant size alphabet our data structure uses $O(n\log{\varepsilon}n)$ bits for any $\varepsilon>0$ and reports all $\mathrm{occ}$ occurrences of a wildcard string in $O(m+\sigmag \cdot\mu(n) + \mathrm{occ})$ time, where $\mu(n)=o(\log\log\log n)$, $\sigma$ is the alphabet size, $m$ is the number of alphabet symbols and $g$ is the number of wildcard symbols in the query string. We also present an $O(n)$-bit index with $O((m+\sigmag+\mathrm{occ})\log{\varepsilon}n)$ query time and an $O(n(\log\log n)2)$-bit index with $O((m+\sigmag+\mathrm{occ})\log\log n)$ query time. These are the first non-trivial data structures for this problem that need $o(n\log n)$ bits of space.