Determine VEP eQTL performance at extrapolated lengths not computed due to cost constraints
Determine the true AUROC values of the NTv2 transformer baseline, NTv2 with position interpolation, Caduceus, and Hawk on the Genomics Long-Range Benchmark (GLRB) VEP eQTL task at input sequence lengths indicated by dotted lines in the extrapolation plot, where the authors were unable to compute results due to computational cost constraints and instead assumed trends. This requires computing the actual model performance at those longer sequence lengths to replace the trend-based estimates.
References
Lines that turn into dotted indicate values that we were unable to compute due to computational cost constraints and are therefore assumed based on trends.
— Leveraging State Space Models in Long Range Genomics
(2504.06304 - Popov et al., 7 Apr 2025) in Figure caption “Comparison of the extrapolation methods of state-space models and attention-based models on VEP eQTLs (AUROC)”, Section “Zero-shot extrapolation”