Quantify splice-site versus distal-context contributions in large-window splicing models

Determine, for long-window deep learning models for splicing such as SpliceAI, Pangolin, and DeltaSplice, the extent to which their predictive signals are derived from local sequence features at canonical donor and acceptor splice sites versus broader genomic context such as promoters and other regulatory elements, in order to disentangle the sources of model signal and interpret their predictions.

Background

The paper introduces minisplice, a small 1D-CNN model designed to learn splice signals and improve spliced alignment in tools like minimap2 and miniprot. In discussing related work, the authors contrast their compact approach with recent large-window deep learning models (e.g., SpliceAI, Pangolin, DeltaSplice) that use long genomic contexts and many more parameters.

The authors note that such large models can capture long-range composition and potentially promoter and other regulatory signals in addition to splice-site-local features. However, they explicitly state uncertainty about how much of the models’ signal originates from the splice sites themselves, raising an open question about the relative contributions of local versus distal sequence context to these models’ predictions.

References

With $\ge$10kb windows and orders of magnitude more parameters, recent deep learning models such as SpliceAI, Pangolin and DeltaSplice will learn composition better. They may additionally see the promoter regions of many genes and species- or even tissue-specific regulatory elements. It is not clear how much their signals come from splice sites.

— Improving spliced alignment by modeling splice sites with deep learning (2506.12986 - Yang et al., 15 Jun 2025) in Discussions (final paragraph)

Quantify splice-site versus distal-context contributions in large-window splicing models

Sponsor

Background

References

Related Problems