2000 character limit reached
Distance Measures for Sequences (1208.5713v1)
Published 28 Aug 2012 in cs.IT, cs.DS, and math.IT
Abstract: Given a set of sequences, the distance between pairs of them helps us to find their similarity and derive structural relationship amongst them. For genomic sequences such measures make it possible to construct the evolution tree of organisms. In this paper we compare several distance measures and examine a method that involves circular shifting one sequence against the other for finding good alignment to minimize Hamming distance. We also use run-length encoding together with LZ77 to characterize information in a binary sequence.