Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DeepGO: Predicting protein functions from sequence and interactions using a deep ontology-aware classifier (1705.05919v1)

Published 15 May 2017 in q-bio.GN, cs.LG, and q-bio.QM

Abstract: A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40,000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein-protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, with significant improvement for predicting cellular locations.

Citations (365)

Summary

  • The paper introduces a deep ontology-aware classifier that integrates CNN-based sequence analysis with protein interaction networks for effective protein function prediction.
  • The paper demonstrates improved performance with an F_max of 0.64 over BLAST benchmarks, particularly in annotating cellular component functions.
  • The paper offers a scalable framework that expedites high-throughput functional annotation and lays groundwork for future expansions to other biological ontologies.

DeepGO: Integrating Sequence and Interaction Networks for Protein Function Prediction

The paper "DeepGO: Predicting protein functions from sequence and interactions using a deep ontology-aware classifier" presents an innovative method for predicting protein functions by leveraging deep learning techniques. The primary challenge addressed is the large-scale, multi-class, and multi-label nature of protein function prediction, as defined by the Gene Ontology (GO) with over 40,000 classes. The sheer volume of novel protein sequences made available by high-throughput sequencing technologies poses significant hurdles for experimental functional characterization.

Methodological Advancements

DeepGO employs a neural network architecture that encodes both protein sequences and protein-protein interaction networks, optimizing function prediction across the GO's biologically hierarchical structure. This model encompasses two pivotal components:

  1. Feature Learning: Utilizing Convolutional Neural Networks (CNNs), the model learns feature representations from amino acid sequences. This is complemented by protein-protein interaction networks embedded into a joint representation space through knowledge graph embeddings, reflecting inter-species orthologous relations.
  2. Ontological Structure Awareness: DeepGO models dependencies in GO hierarchies, discerning interrelations among different functional classes. This hierarchical layout refines predictions through recursive neural computations that incorporate parent-child class relationships, enhancing performance across the ontology.

Evaluation and Results

DeepGO was evaluated using the Computational Assessment of Function Annotation (CAFA) benchmarks, demonstrating notable improvement over baseline methods like BLAST, especially in predicting cellular component locations. The performance was measured using protein-centric and term-centric metrics, with results indicating substantial gains in the Ontologies of Cellular Component (CC) and Molecular Function (MF).

  • Performance Metrics: Metrics such as the maximum F-measure (FmaxF_{max}) and ROC AUC were used to ascertain the efficiency of predictions. The DeepGO model, leveraging both sequence and interaction data, outperformed BLAST, particularly in the CC ontology with FmaxF_{max} of 0.64.

Domain Implications

The implications of DeepGO are extensive in both computational biology and bioinformatics sectors. Practically, this approach expedites protein characterization processes, allowing biologists to hypothesize functions for novel proteins in various organisms efficiently. Theoretically, it presents a framework that can be adapted for other ontologically structured problems, such as predicting gene-disease associations using the Disease Ontology.

Future Developments

Looking forward, expanding DeepGO's applicability could involve integrating additional biological data, such as transcriptional co-expression, regulatory networks, or even larger sets of protein-protein interactions. Moreover, incorporating more sophisticated representations of GO's part-of and regulatory relations could further enhance predictive accuracy.

Further research can explore improving the model's ability to predict lower-abundance functions by penalizing false negatives in a context-aware manner, aligning predictions with the biological importance of specific protein functions. Additionally, adapting the framework to address emerging biological data types and ontologies could catalyze advancements in computational function annotation tools.

In summary, DeepGO paves a promising path by combining ontological frameworks with deep learning, thereby advancing high-throughput computational annotations' accuracy and efficiency.