Dice Question Streamline Icon: https://streamlinehq.com

Unknown impact of peptide length and noise on model performance

Determine the extent to which peptide length and noise peaks ratio affect the performance of deep learning-based de novo peptide sequencing models, specifically DeepNovo, PointNovo, Casanovo, InstaNovo, AdaNovo, and π-HelixNovo, across standardized mass spectrometry datasets.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper highlights that while longer peptide sequences and higher noise peaks ratios are intuitively expected to degrade model performance, the precise degree to which these factors affect different de novo peptide sequencing models had not been quantified. This uncertainty complicates model selection for practical scenarios where peptide length and spectral noise vary widely.

NovoBench integrates six state-of-the-art deep learning models and evaluates them across multiple datasets and metrics. The benchmark is motivated in part by the need to systematically investigate robustness to influencing factors, including peptide length and noise, which remained uncharacterized at the time of writing.

References

Intuitively, longer peptide sequences and a higher noise peaks ratio are expected to degrade the performance of various models. However, the extent to which these factors affect different models remains unknown.

NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics (2406.11906 - Zhou et al., 16 Jun 2024) in Introduction, bullet point "The robustness to important influencing factors"