Matching strings in encoded sequences

Published 22 Mar 2019 in math.PR, cs.DS, cs.IT, math.DS, and math.IT | (1903.09625v2)

Abstract: We investigate the longest common substring problem for encoded sequences and its asymptotic behaviour. The main result is a strong law of large numbers for a re-scaled version of this quantity, which presents an explicit relation with the R\'enyi entropy of the source. We apply this result to the zero-inflated contamination model and the stochastic scrabble. In the case of dynamical systems, this problem is equivalent to the shortest distance between two observed orbits and its limiting relationship with the correlation dimension of the pushforward measure. An extension to the shortest distance between orbits for random dynamical systems is also provided.