Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Faster Grammar-Based Self-Index

Published 19 Sep 2011 in cs.DS | (1109.3954v6)

Abstract: To store and search genomic databases efficiently, researchers have recently started building compressed self-indexes based on grammars. In this paper we show how, given a straight-line program with $r$ rules for a string (S [1..n]) whose LZ77 parse consists of $z$ phrases, we can store a self-index for $S$ in $\Oh{r + z \log \log n}$ space such that, given a pattern (P [1..m]), we can list the $\occ$ occurrences of $P$ in $S$ in $\Oh{m2 + \occ \log \log n}$ time. If the straight-line program is balanced and we accept a small probability of building a faulty index, then we can reduce the $\Oh{m2}$ term to $\Oh{m \log m}$. All previous self-indexes are larger or slower in the worst case.

Citations (107)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.