Determine VEP eQTL performance at extrapolated lengths not computed due to cost constraints

Determine the true AUROC values of the NTv2 transformer baseline, NTv2 with position interpolation, Caduceus, and Hawk on the Genomics Long-Range Benchmark (GLRB) VEP eQTL task at input sequence lengths indicated by dotted lines in the extrapolation plot, where the authors were unable to compute results due to computational cost constraints and instead assumed trends. This requires computing the actual model performance at those longer sequence lengths to replace the trend-based estimates.

Background

The paper evaluates zero-shot extrapolation of several architectures—NTv2 transformer, NTv2 with position interpolation, Caduceus, and Hawk—on the GLRB VEP eQTL task. The figure compares AUROC across increasing input sequence lengths and marks the training context length (12 kbp).

For certain longer sequence lengths, the authors did not compute the actual AUROC values due to computational cost constraints and instead depicted dotted lines representing assumed values based on observed trends. Establishing the real performance at those lengths would resolve these trend-based placeholders and provide definitive extrapolation results.

References

Lines that turn into dotted indicate values that we were unable to compute due to computational cost constraints and are therefore assumed based on trends.

— Leveraging State Space Models in Long Range Genomics (2504.06304 - Popov et al., 7 Apr 2025) in Figure caption “Comparison of the extrapolation methods of state-space models and attention-based models on VEP eQTLs (AUROC)”, Section “Zero-shot extrapolation”

Determine VEP eQTL performance at extrapolated lengths not computed due to cost constraints

Background

References

Related Problems