Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Time-Space Tradeoffs for Finding a Long Common Substring (2003.02016v2)

Published 4 Mar 2020 in cs.DS

Abstract: We consider the problem of finding, given two documents of total length $n$, a longest string occurring as a substring of both documents. This problem, known as the Longest Common Substring (LCS) problem, has a classic $O(n)$-time solution dating back to the discovery of suffix trees (Weiner, 1973) and their efficient construction for integer alphabets (Farach-Colton, 1997). However, these solutions require $\Theta(n)$ space, which is prohibitive in many applications. To address this issue, Starikovskaya and Vildh{\o}j (CPM 2013) showed that for $n{2/3} \le s \le n{1-o(1)}$, the LCS problem can be solved in $O(s)$ space and $O(\frac{n2}{s})$ time. Kociumaka et al. (ESA 2014) generalized this tradeoff to $1 \leq s \leq n$, thus providing a smooth time-space tradeoff from constant to linear space. In this paper, we obtain a significant speed-up for instances where the length $L$ of the sought LCS is large. For $1 \leq s \leq n$, we show that the LCS problem can be solved in $O(s)$ space and $\tilde{O}(\frac{n2}{L\cdot s}+n)$ time. The result is based on techniques originating from the LCS with Mismatches problem (Flouri et al., 2015; Charalampopoulos et al., CPM 2018), on space-efficient locally consistent parsing (Birenzwige et al., SODA 2020), and on the structure of maximal repetitions (runs) in the input documents.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Stav Ben-Nun (2 papers)
  2. Shay Golan (17 papers)
  3. Tomasz Kociumaka (97 papers)
  4. Matan Kraus (7 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.