Papers
Topics
Authors
Recent
Search
2000 character limit reached

Online Grammar Compression for Frequent Pattern Discovery

Published 15 Jul 2016 in cs.DS | (1607.04446v3)

Abstract: Various grammar compression algorithms have been proposed in the last decade. A grammar compression is a restricted CFG deriving the string deterministically. An efficient grammar compression develops a smaller CFG by finding duplicated patterns and removing them. This process is just a frequent pattern discovery by grammatical inference. While we can get any frequent pattern in linear time using a preprocessed string, a huge working space is required for longer patterns, and the whole string must be loaded into the memory preliminarily. We propose an online algorithm approximating this problem within a compressed space. The main contribution is an improvement of the previously best known approximation ratio $\Omega(\frac{1}{\lg2m})$ to $\Omega(\frac{1}{\lg*N\lg m})$ where $m$ is the length of an optimal pattern in a string of length $N$ and $\lg*$ is the iteration of the logarithm base $2$. For a sufficiently large $N$, $\lg*N$ is practically constant. The experimental results show that our algorithm extracts nearly optimal patterns and achieves a significant improvement in memory consumption compared to the offline algorithm.

Citations (6)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.