Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neighborhood-Based Label Propagation in Large Protein Graphs (1708.07074v1)

Published 9 Aug 2017 in cs.DC and cs.LG

Abstract: Understanding protein function is one of the keys to understanding life at the molecular level. It is also important in several scenarios including human disease and drug discovery. In this age of rapid and affordable biological sequencing, the number of sequences accumulating in databases is rising with an increasing rate. This presents many challenges for biologists and computer scientists alike. In order to make sense of this huge quantity of data, these sequences should be annotated with functional properties. UniProtKB consists of two components: i) the UniProtKB/Swiss-Prot database containing protein sequences with reliable information manually reviewed by expert bio-curators and ii) the UniProtKB/TrEMBL database that is used for storing and processing the unknown sequences. Hence, for all proteins we have available the sequence along with few more information such as the taxon and some structural domains. Pairwise similarity can be defined and computed on proteins based on such attributes. Other important attributes, while present for proteins in Swiss-Prot, are often missing for proteins in TrEMBL, such as their function and cellular localization. The enormous number of protein sequences now in TrEMBL calls for rapid procedures to annotate them automatically. In this work, we present DistNBLP, a novel Distributed Neighborhood-Based Label Propagation approach for large-scale annotation of proteins. To do this, the functional annotations of reviewed proteins are used to predict those of non-reviewed proteins using label propagation on a graph representation of the protein database. DistNBLP is built on top of the "akka" toolkit for building resilient distributed message-driven applications.

Summary

We haven't generated a summary for this paper yet.