Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

String Attractors (1709.05314v2)

Published 15 Sep 2017 in cs.DS

Abstract: Let $S$ be a string of length $n$. In this paper we introduce the notion of \emph{string attractor}: a subset of the string's positions $[1,n]$ such that every distinct substring of $S$ has an occurrence crossing one of the attractor's elements. We first show that the minimum attractor's size yields upper-bounds to the string's repetitiveness as measured by its linguistic complexity and by the length of its longest repeated substring. We then prove that all known compressors for repetitive strings induce a string attractor whose size is bounded by their associated repetitiveness measure, and can therefore be considered as approximations of the smallest one. Using further reductions, we derive the approximation ratios of these compressors with respect to the smallest attractor and solve several open problems related to the asymptotic relations between repetitiveness measures (in particular, between the the sizes of the Lempel-Ziv factorization, the run-length Burrows-Wheeler transform, the smallest grammar, and the smallest macro scheme). These reductions directly provide approximation algorithms for the smallest string attractor. We then apply string attractors to solve efficiently a fundamental problem in the field of compressed computation: we present a universal compressed data structure for text extraction that improves existing strategies simultaneously for \emph{all} known dictionary compressors and that, by recent lower bounds, almost matches the optimal running time within the resulting space. To conclude, we consider generalizations of string attractors to labeled graphs, show that the attractor problem is NP-complete on trees, and provide a logarithmic approximation computable in polynomial time.

Citations (6)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com