Anchor points for genome alignment based on Filtered Spaced Word Matches

Published 26 Mar 2017 in q-bio.GN | (1703.08792v1)

Abstract: Alignment of large genomic sequences is a fundamental task in computational genome analysis. Most methods for genomic alignment use high-scoring local alignments as {\em anchor points} to reduce the search space of the alignment procedure. Speed and quality of these methods therefore depend on the underlying anchor points. Herein, we propose to use {\em Filtered Spaced Word Matches} to calculate anchor points for genome alignment. To evaluate this approach, we used these anchor points in the the widely used alignment pipeline {\em Mugsy}. For distantly related sequence sets, we could substantially improve the quality of alignments produced by {\em Mugsy}.