Hierarchical Protein Function Prediction with Tail-GNNs (2007.12804v1)

Published 24 Jul 2020 in cs.LG, q-bio.BM, and stat.ML

Abstract: Protein function prediction may be framed as predicting subgraphs (with certain closure properties) of a directed acyclic graph describing the hierarchy of protein functions. Graph neural networks (GNNs), with their built-in inductive bias for relational data, are hence naturally suited for this task. However, in contrast with most GNN applications, the graph is not related to the input, but to the label space. Accordingly, we propose Tail-GNNs, neural networks which naturally compose with the output space of any neural network for multi-task prediction, to provide relationally-reinforced labels. For protein function prediction, we combine a Tail-GNN with a dilated convolutional network which learns representations of the protein sequence, making significant improvement in F_1 score and demonstrating the ability of Tail-GNNs to learn useful representations of labels and exploit them in real-world problem solving.

Citations (5)

View on Semantic Scholar

Summary

The paper introduces Tail-GNNs, a novel graph neural network variant that integrates hierarchical label information from the Gene Ontology.
The methodology couples a dilated convolutional network with Tail-GNNs to efficiently extract protein sequence representations and enforce closure properties of the label space.
Experimental results on CAFA3 datasets reveal significant F1 score improvements, demonstrating the framework's robustness and potential for broader applications.

Overview of "Hierarchical Protein Function Prediction with Tail-GNNs"

The paper "Hierarchical Protein Function Prediction with Tail-GNNs," authored by Stefan Spalevic et al., introduces a novel approach to predict protein functions by leveraging the hierarchical structure of the Gene Ontology (GO) using graph neural networks (GNNs). This paper addresses a critical challenge in bioinformatics: predicting protein functions based on the rapidly expanding database of sequenced genomes. Traditional laboratory methods for protein function determination are inadequate for handling the volume of data being generated, necessitating automated prediction techniques.

Novel Contribution: Tail-GNNs

The authors propose Tail-GNNs, a variant of GNNs specifically designed for handling hierarchical label spaces such as the Gene Ontology, which is represented as a directed acyclic graph (DAG). This is a significant departure from conventional GNN applications where the input data itself forms the graph. Instead, in this work, the graph is inherent to the label space—essentially utilizing known relational inductive biases among the labels. Tail-GNNs integrate smoothly with multi-task prediction models and are particularly adept at processing relational information to refine label predictions.

Methodology

The methodology is centered on the combination of a Tail-GNN with a dilated convolutional network to form an end-to-end architecture capable of protein function prediction. Key architectural components include:

Dilated Convolutional Network: This network extracts representations from protein sequences defined by amino acid sequences. The use of dilated convolutions allows the network to efficiently handle long protein sequences.
Tail-GNN: Working with the extracted representations, the Tail-GNN utilizes the DAG structure of GO to enhance the predictions by enforcing closure properties inherent in the ontology, particularly exploiting the relational inductive biases.

A notable technical consideration is the inclusion of spectral features derived from the graph Laplacian, aimed at enhancing the node representations in the Tail-GNN.

Experimental Results

The research is evaluated using datasets from the Critical Assessment of Function Annotation (CAFA3) challenge. Results indicate significant improvements in the F1 score when Tail-GNNs are used in tandem with the dilated convolutional network. Specifically, the Tail-GNN employing sum-pooling aggregation mechanisms outperformed alternatives with varied aggregators, such as mean-pooling and max-pooling.

Implications and Future Directions

The findings demonstrate the effectiveness of Tail-GNNs in integrating hierarchical label information, making substantial contributions to the protein function prediction domain. The architecture offers flexibility and can potentially be adapted to other scientific domains where label hierarchies are prevalent, such as in spatial phenomena prediction and polypharmacy side effect prediction.

Future research directions could involve extending the Tail-GNN framework to incorporate additional biological networks, such as protein-protein interaction (PPI) networks, thereby improving predictive performance. Moreover, exploring the transferability of the framework across different domains will further underscore its utility and adaptability.

In conclusion, the paper presents a notable advancement in the application of graph neural networks to hierarchical label prediction tasks, reinforcing the utility of graph-based methods in complex biological data analysis.

PDF Markdown

Related Papers

YouTube

Show All Videos