2000 character limit reached
Subset seed automaton (1408.6198v1)
Published 18 Aug 2014 in cs.FL, cs.DS, and q-bio.QM
Abstract: We study the pattern matching automaton introduced in (A unifying framework for seed sensitivity and its application to subset seeds) for the purpose of seed-based similarity search. We show that our definition provides a compact automaton, much smaller than the one obtained by applying the Aho-Corasick construction. We study properties of this automaton and present an efficient implementation of the automaton construction. We also present some experimental results and show that this automaton can be successfully applied to more general situations.