Terrier: A Deep Learning Repeat Classifier (2503.09312v1)
Abstract: Repetitive DNA sequences underpin genome architecture and evolutionary processes, yet they remain challenging to classify accurately. Terrier is a deep learning model designed to overcome these challenges by classifying repetitive DNA sequences using a publicly available, curated repeat sequence library trained under the RepeatMasker schema. Existing tools often struggle to classify divergent taxa due to biases in reference libraries, limiting our understanding of repeat evolution and function. Terrier overcomes these challenges by leveraging deep learning for improved accuracy. Trained on RepBase, which includes over 100,000 repeat families -- four times more than Dfam -- Terrier maps 97.1% of RepBase sequences to RepeatMasker categories, offering the most comprehensive classification system available. When benchmarked against DeepTE, TERL, and TEclass2 in model organisms (rice and fruit flies), Terrier achieved superior accuracy while classifying a broader range of sequences. Further validation in non-model amphibian and flatworm genomes highlights its effectiveness in improving classification in non-model species, facilitating research on repeat-driven evolution, genomic instability, and phenotypic variation.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.