Clarify the comparative utility of self-supervised versus supervised genomic models for non-coding variant effect prediction
Determine, through standardized independent benchmarking, the relative utility of current self-supervised genomic language models (for example, DNABERT, LOGO, Nucleotide Transformer, Caduceus) compared to supervised sequence-to-activity models (for example, Enformer, Basenji2, Borzoi) for non-coding variant effect prediction in both zero-shot and fine-tuned settings, and ascertain the conditions under which each approach provides superior performance for downstream variant interpretation tasks.
References
The utility of current self-supervised models relative to supervised models remains unclear in independent benchmarking studies , indicating that additional work is needed to make optimal use of self-supervised tasks in genomic applications.