PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design (2312.00080v1)

Published 30 Nov 2023 in q-bio.QM and cs.LG

Abstract: Structure-based protein design has attracted increasing interest, with numerous methods being introduced in recent years. However, a universally accepted method for evaluation has not been established, since the wet-lab validation can be overly time-consuming for the development of new algorithms, and the $\textit{in silico}$ validation with recovery and perplexity metrics is efficient but may not precisely reflect true foldability. To address this gap, we introduce two novel metrics: refoldability-based metric, which leverages high-accuracy protein structure prediction models as a proxy for wet lab experiments, and stability-based metric, which assesses whether models can assign high likelihoods to experimentally stable proteins. We curate datasets from high-quality CATH protein data, high-throughput $\textit{de novo}$ designed proteins, and mega-scale experimental mutagenesis experiments, and in doing so, present the $\textbf{PDB-Struct}$ benchmark that evaluates both recent and previously uncompared protein design methods. Experimental results indicate that ByProt, ProteinMPNN, and ESM-IF perform exceptionally well on our benchmark, while ESM-Design and AF-Design fall short on the refoldability metric. We also show that while some methods exhibit high sequence recovery, they do not perform as well on our new benchmark. Our proposed benchmark paves the way for a fair and comprehensive evaluation of protein design methods in the future. Code is available at https://github.com/WANG-CR/PDB-Struct.

References (53)

Citations (4)

View on Semantic Scholar

Summary

The paper introduces PDB-Struct with novel refoldability and stability metrics to rigorously assess protein design methodologies.
It leverages advanced prediction models like AlphaFold2 and ESMFold to evaluate structural integrity using TM-scores and pLDDT scores.
The benchmark integrates diverse datasets, revealing performance variations among models to guide improvements in de novo protein design.

A Comprehensive Evaluation Framework for Structure-based Protein Design: Introducing PDB-Struct

Protein design, especially de novo design, is a crucial aspect of bioengineering with extensive applications in therapeutics, enzyme engineering, and antibody innovation. The integration of deep learning techniques into protein design has significantly enhanced the ability to generate novel protein sequences that possess specific structural properties. Despite these advances, there exists a notable gap in standardized benchmarks for evaluating various protein design methodologies. The paper "PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design" addresses this deficit by introducing two novel metrics to bolster evaluation rigor: refoldability-based metrics and stability-based metrics.

Introduction to Novel Metrics

Refoldability-Based Metrics: Existing in silico methods primarily rely on sequence recovery and perplexity to evaluate protein design models. However, these metrics can be misleading as they may not accurately reflect the true foldability of the designed proteins. Even a high sequence similarity might not guarantee correct folding, as single mutations can lead to misfolding and diseases such as Alzheimer's. The authors propose "refoldability" as a metric, which assesses whether designed sequences can fold into a stable structure and if that structure resembles the input. High-accuracy structure prediction models, such as AlphaFold2 and ESMFold, are utilized to predict and evaluate the structural stability and similarity, measured by TM-scores and pLDDT scores.

Stability-Based Metrics: The perplexity of a designed sequence, based on various assumptions by different models, introduces significant challenges for comparison. The authors advocate for a stability-based metric using datasets with experimental stability scores. This metric evaluates whether a design method can correctly assign likelihoods to sequences with high experimental stability, providing a more reliable estimate of a model's capacity to predict sequence landscapes.

Datasets and Benchmark Composition

The benchmark, PDB-Struct, comprises datasets curated from the CATH protein structure database, high-throughput de novo designed proteins, and mega-scale mutagenesis experiments. This comprehensive collection ensures that the proposed metrics can evaluate a wide array of design models fairly and thoroughly.

Experimental Evaluation and Findings

The experiments conducted using PDB-Struct reveal nuanced insights:

ProteinMPNN, ByProt, and ESM-IF demonstrated notable performance on refoldability metrics with AlphaFold2 prediction exceeding Ref-TM scores of 0.80, evidencing their robustness in maintaining structural integrity and stability.
AF-Design and ESM-Design, despite their innovative integration of structure prediction networks, showed limited efficacy in both metrics, indicating a need for enhanced efficiency and prediction accuracy.
There was discrete consistency observed across different structure prediction models (AlphaFold2, OmegaFold, ESMFold) when evaluating refoldability, affirming the robustness of the proposed metrics.
Stability metrics uncovered that ESM-IF performs exceptionally well, indicating accurate sequence density estimation. On the other hand, structure-prediction based models did not appear as sensitive to point mutations as expected.

Implications and Future Directions

The introduction of PDB-Struct sets a foundation for more rigorous and consistent benchmarking in protein design. By addressing current metric limitations and introducing novel evaluative strategies, this work significantly enhances the precision of model assessment. Practically, these metrics can improve the selection process of models for drug discovery, enzyme design, and synthetic biology.

Future research could extend PDB-Struct by integrating additional datasets and developing metrics to assess diversity and functional specificity of designed proteins. Furthermore, adapting these metrics for real-time application in industrial and clinical environments remains an exciting avenue for exploration.

PDF Markdown

Related Papers

Tweets

https://twitter.com/DdelAlamo/status/1809149262799794352