Quantify splice-site versus distal-context contributions in large-window splicing models
Determine, for long-window deep learning models for splicing such as SpliceAI, Pangolin, and DeltaSplice, the extent to which their predictive signals are derived from local sequence features at canonical donor and acceptor splice sites versus broader genomic context such as promoters and other regulatory elements, in order to disentangle the sources of model signal and interpret their predictions.
References
With $\ge$10kb windows and orders of magnitude more parameters, recent deep learning models such as SpliceAI, Pangolin and DeltaSplice will learn composition better. They may additionally see the promoter regions of many genes and species- or even tissue-specific regulatory elements. It is not clear how much their signals come from splice sites.