Papers
Topics
Authors
Recent
Search
2000 character limit reached

Faster and Simpler Online Computation of String Net Frequency

Published 9 Oct 2024 in cs.DS | (2410.06837v2)

Abstract: An occurrence of a repeated substring $u$ in a string $S$ is called a net occurrence if extending the occurrence to the left or to the right decreases the number of occurrences to 1. The net frequency (NF) of a repeated substring $u$ in a string $S$ is the number of net occurrences of $u$ in $S$. Very recently, Guo et al. [SPIRE 2024] proposed an online $O(n \log \sigma)$-time algorithm that maintains a data structure of $O(n)$ space which answers Single-NF queries in $O(m\log \sigma + \sigma2)$ time and reports all answers of the All-NF problem in $O(n\sigma2)$ time. Here, $n$ is the length of the input string $S$, $m$ is the query pattern length, and $\sigma$ is the alphabet size. The $\sigma2$ term is a major drawback of their method since computing string net frequencies is originally motivated for Chinese language processing where $\sigma$ can be thousands large. This paper presents an improved online $O(n \log \sigma)$-time algorithm, which answers Single-NF queries in $O(m \log \sigma)$ time and reports all answers to the All-NF problem in output-optimal $O(|\mathsf{NF}+(S)|)$ time, where $\mathsf{NF}+(S)$ is the set of substrings of $S$ paired with their positive NF values. We note that $|\mathsf{NF}+(S)| = O(n)$ always holds. In contract to Guo et al.'s algorithm that is based on Ukkonen's suffix tree construction, our algorithm is based on Weiner's suffix tree construction.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.