Systematic assessment of the expected length, variance and distribution of Longest Common Subsequences (1306.4253v1)
Abstract: The Longest Common Subsequence (LCS) problem is a very important problem in math- ematics, which has a broad application in scheduling problems, physics and bioinformatics. It is known that the given two random sequences of infinite lengths, the expected length of LCS will be a constant. however, the value of this constant is not yet known. Moreover, the variance distribution of LCS length is also not fully understood. The problem becomes more difficult when there are (a) multiple sequences, (b) sequences with non-even distribution of alphabets and (c) large alphabets. This work focus on these more complicated issues. We have systematically analyze the expected length, variance and distribution of LCS based on extensive Monte Carlo simulation. The results on expected length are consistent with currently proved theoretical results, and the analysis on variance and distribution provide further insights into the problem.