Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
GPT-4o
Gemini 2.5 Pro Pro
o3 Pro
GPT-4.1 Pro
DeepSeek R1 via Azure Pro
2000 character limit reached

Deep Manifold Transformation for Protein Representation Learning (2402.09416v1)

Published 12 Jan 2024 in q-bio.BM and cs.LG

Abstract: Protein representation learning is critical in various tasks in biology, such as drug design and protein structure or function prediction, which has primarily benefited from protein LLMs and graph neural networks. These models can capture intrinsic patterns from protein sequences and structures through masking and task-related losses. However, the learned protein representations are usually not well optimized, leading to performance degradation due to limited data, difficulty adapting to new tasks, etc. To address this, we propose a new \underline{d}eep \underline{m}anifold \underline{t}ransformation approach for universal \underline{p}rotein \underline{r}epresentation \underline{l}earning (DMTPRL). It employs manifold learning strategies to improve the quality and adaptability of the learned embeddings. Specifically, we apply a novel manifold learning loss during training based on the graph inter-node similarity. Our proposed DMTPRL method outperforms state-of-the-art baselines on diverse downstream tasks across popular datasets. This validates our approach for learning universal and robust protein representations. We promise to release the code after acceptance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. “Protein representation learning by geometric structure pretraining,” in ICLR, 2023.
  2. “Protein language models and structure prediction: Connection and progression,” arXiv:2211.16742, 2022.
  3. “Learning from protein structure with geometric vector perceptrons,” arXiv:2009.01411, 2020.
  4. “Generative de novo protein design with global context,” arXiv preprint arXiv:2204.10673, 2022.
  5. “Global-context aware generative protein design,” in ICASSP. IEEE, 2023, pp. 1–5.
  6. “Prottrans: Towards cracking the language of lifes code through self-supervised deep learning and high performance computing,” PAMI, 2021.
  7. “Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences,” PNAS, 2019.
  8. “Continuous-discrete convolution for geometry-sequence modeling in proteins,” in ICLR, 2023.
  9. “Opus-rota4: a gradient-based protein side-chain modeling framework assisted by deep learning-based predictors,” Briefings in Bioinformatics, vol. 23, no. 1, pp. bbab529, 2022.
  10. “Protein folding funnels: a kinetic approach to the sequence-structure relationship.,” PNAS, 1992.
  11. “High-resolution de novo structure prediction from primary sequence,” 2022.
  12. “Mgae: Marginalized graph autoencoder for graph clustering,” CIKM, 2017.
  13. “Dlme: Deep local-flatness manifold embedding,” in ECCV. Springer, 2022, pp. 576–592.
  14. “Generative models for graph-based protein design,” NeurIPS, vol. 32, 2019.
  15. “Semi-supervised classification with graph convolutional networks,” arXiv:1609.02907, 2016.
  16. Laurens Van der Maaten and Geoffrey Hinton, “Visualizing data using t-sne.,” JMLR, vol. 9, no. 11, 2008.
  17. “Umap: Uniform manifold approximation and projection for dimension reduction,” arXiv:1802.03426, 2018.
  18. “Protein representation learning via knowledge enhanced primary structure modeling,” bioRxiv, 2023.
  19. “Semignn-ppi: Self-ensembling multi-graph neural network for efficient and generalizable protein-protein interaction prediction,” ArXiv, vol. abs/2305.08316, 2023.
  20. “Predicting protein-protein interactions through sequence-based deep learning.,” Bioinformatics, p. i802–i810, Sep 2018.
  21. “Multifaceted protein–protein interaction prediction based on Siamese residual RCNN,” Bioinformatics, vol. 35, no. 14, pp. i305–i314, 07 2019.
  22. “Learning unknown from correlations: Graph neural network for inter-novel-protein interaction prediction.,” in IJCAI, Aug 2021.
  23. “Is transfer learning necessary for protein landscape prediction?,” arXiv:2011.03443, 2020.
  24. “Evaluating protein transfer learning with tape,” Advances in neural information processing systems, vol. 32, 2019.
  25. “Graph attention networks,” stat, vol. 1050, no. 20, pp. 10–48550, 2017.
  26. “Deep convolutional networks for quality assessment of protein folds,” Bioinformatics, vol. 34, no. 23, pp. 4046–4053, 2018.
  27. “Learning hierarchical protein representations via complete 3d graph networks,” in ICLR, 2023.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com