Protein secondary structure prediction using deep convolutional neural fields (1512.00843v3)

Published 2 Dec 2015 in q-bio.BM, cs.LG, and q-bio.QM

Abstract: Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

Citations (510)

View on Semantic Scholar

Summary

The paper introduces DeepCNF, a novel method integrating CRF and deep convolutional neural networks to achieve ~84% Q3 accuracy.
It utilizes a deep hierarchical architecture to model complex sequence-structure relationships and interdependencies between consecutive labels.
The method demonstrates superior generalization across diverse datasets, highlighting its potential for advancing computational protein analysis.

Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields

This paper presents a novel method, DeepCNF (Deep Convolutional Neural Fields), for predicting protein secondary structure (SS) states. The focus on predicting protein SS is due to its significant implications for understanding protein function and facilitating applications such as drug design. The approach integrates the strengths of Conditional Neural Fields (CNF) with deep learning, leveraging a deep hierarchical architecture to capture complex sequence-structure relationships and interdependency between adjacent SS labels.

Key Contributions and Results

DeepCNF outperforms existing methods by obtaining substantial improvements in Q3 accuracy (~84%), SOV score (~85%), and Q8 accuracy (~72%) on CASP and CAMEO test proteins. These results surpass traditional methods like PSIPRED, which have plateaued at ~80% Q3 accuracy for the past decade. Notably, DeepCNF demonstrated enhanced performance, especially in challenging SS types such as high curvature regions, beta loops, and irregular loops.

Methodology

DeepCNF combines the principles of Conditional Random Fields (CRF) and Deep Convolutional Neural Networks (DCNN). This allows for modeling intricate relationships between input features and output labels, while simultaneously accounting for interdependencies among SS labels in consecutive residues.

The architecture consists of two primary modules: the CRF module, which models label correlations, and the DCNN module, which processes input features through multiple hidden layers. The paper emphasizes the design choices, including the number and configuration of hidden layers, explaining that increasing layers enhances performance due to improved long-range sequence information capture.

Datasets and Experimental Setup

The research utilized several datasets: CullPDB, CB513, CASP10, CASP11, and CAMEO, ensuring diverse testing conditions and robustness evaluation. Remarkably, the method demonstrated superior generalization, outperforming competitors on datasets with limited homologous sequence information, highlighting its effectiveness in learning sequence-structure relationships without heavy reliance on sequence similarity.

Implications and Future Directions

The improved performance of DeepCNF indicates its potential application beyond secondary structure prediction. The flexibility of the DeepCNF framework could apply to other protein structure predictions, such as contact numbers and solvent accessibility, enhancing computational tools in structural biology.

Despite the advancement, DeepCNF's performance on sparse sequence profiles (Neff ≤ 2) shows room for improvement, suggesting future work could focus on enhancing predictions from raw sequences rather than profiles. Further development might involve hybrid models or alternative input feature representations to overcome these limitations.

Overall, DeepCNF presents a significant methodological advancement, improving accuracy in protein SS prediction, and laying groundwork for future innovations in computational biology and AI applications in bioinformatics.

PDF Markdown