Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model (1609.00680v6)

Published 2 Sep 2016 in q-bio.BM, cs.LG, q-bio.QM, and stat.ML

Abstract: Recently exciting progress has been made on protein contact prediction, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual networks. This deep neural network allows us to model very complex sequence-contact relationship as well as long-range inter-contact correlation. Our method greatly outperforms existing contact prediction methods and leads to much more accurate contact-assisted protein folding. Tested on three datasets of 579 proteins, the average top L long-range prediction accuracy obtained our method, the representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints can yield correct folds (i.e., TMscore>0.6) for 203 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 proteins, respectively. Further, our contact-assisted models have much better quality than template-based models. Using our predicted contacts as restraints, we can (ab initio) fold 208 of the 398 membrane proteins with TMscore>0.5. By contrast, when the training proteins of our method are used as templates, homology modeling can only do so for 10 of them. One interesting finding is that even if we do not train our prediction models with any membrane proteins, our method works very well on membrane protein prediction. Finally, in recent blind CAMEO benchmark our method successfully folded 5 test proteins with a novel fold.

Citations (804)

View on Semantic Scholar

Summary

The paper presents an ultra-deep dual residual network model that integrates sequential and pairwise features for highly accurate protein contact map prediction.
The method achieves notable improvements, with long-range prediction accuracy reaching 0.47 and facilitating correct folding for a majority of test proteins.
The findings enhance de novo protein structure prediction, offering practical implications for drug design, protein engineering, and synthetic biology.

Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model

The paper * "Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model" * presents a noteworthy advancement in protein contact prediction using a sophisticated deep learning approach. This work is pivotal because accurate contact maps are crucial for predicting protein structure and function, a fundamental challenge in computational biology.

Methodology

The authors employ an ultra-deep learning model composed of dual deep residual neural networks to predict protein contact maps. This innovative architecture leverages both evolutionary coupling (EC) and sequence conservation information. Specifically:

First Module: A 1-dimensional residual network processes sequential features such as sequence profiles, predicted secondary structures, and solvent accessibility through a series of convolutional transformations.
Second Module: A 2-dimensional residual network handles pairwise features including the output from the first module, EC information, and pairwise potentials. This module further transforms the input via convolutional layers.

This two-tiered approach allows for modeling complex sequence-structure relationships and contact occurrence patterns.

Results

The model's performance was rigorously tested against several benchmarks, including 105 CASP11 targets, 76 CAMEO hard targets, and 398 membrane proteins. The results demonstrated significant improvements over existing methods, notably:

Long-Range Prediction Accuracy: Average top L long-range prediction accuracy was 0.47 for the deep learning method compared to 0.21 and 0.30 for CCMpred and MetaPSICOV, respectively.
Contact-Assisted Folding: The deep learning method enabled the correct folding (TMscore > 0.6) for 203 out of 579 test proteins, vastly outperforming MetaPSICOV (79 proteins) and CCMpred (62 proteins).
Template-Based Modeling: For membrane proteins, the deep learning model generated 3D models with a TMscore > 0.5 for 208 out of 398 proteins, compared to only 10 for homology models.

The deep learning model's ability to predict contacts accurately even for proteins with a limited number of sequence homologs illustrates its robustness.

Implications and Future Directions

This work has substantial implications for both practical applications and theoretical advancements in bioinformatics and computational biology:

Practical Implications: The improved accuracy in predicting protein contact maps directly enhances our ability to infer protein structures de novo, especially for proteins with limited sequence data or novel folds. This can accelerate biological discoveries and applications in drug design, protein engineering, and synthetic biology.
Theoretical Impact: The approach showcases how ultra-deep learning models, particularly those employing residual networks, can outperform traditional methods by capturing high-order correlations and complex relationships. It opens avenues for further exploration into even deeper networks or integrated models that incorporate additional biological data.
Future Directions: Potential improvements could come from expanding the training set with more diverse and extensive datasets or integrating other forms of biological information (e.g., inter-residue distances). Additionally, optimizing the deep network to handle longer protein sequences or improving memory management to allow more layers could further enhance the model's performance.

In conclusion, this paper represents a significant step forward in the field of protein contact prediction, demonstrating the power of ultra-deep learning models to overcome some of the longstanding challenges in protein structure prediction. The work not only offers a robust tool for bioinformatics but also sets the stage for future innovations and developments.

PDF Markdown