ProtChatGPT: Towards Understanding Proteins with Large Language Models (2402.09649v1)

Published 15 Feb 2024 in cs.CE, cs.AI, and q-bio.BM

Abstract: Protein research is crucial in various fundamental disciplines, but understanding their intricate structure-function relationships remains challenging. Recent LLMs have made significant strides in comprehending task-specific knowledge, suggesting the potential for ChatGPT-like systems specialized in protein to facilitate basic research. In this work, we introduce ProtChatGPT, which aims at learning and understanding protein structures via natural languages. ProtChatGPT enables users to upload proteins, ask questions, and engage in interactive conversations to produce comprehensive answers. The system comprises protein encoders, a Protein-Language Pertaining Transformer (PLP-former), a projection adapter, and an LLM. The protein first undergoes protein encoders and PLP-former to produce protein embeddings, which are then projected by the adapter to conform with the LLM. The LLM finally combines user questions with projected embeddings to generate informative answers. Experiments show that ProtChatGPT can produce promising responses to proteins and their corresponding questions. We hope that ProtChatGPT could form the basis for further exploration and application in protein research. Code and our pre-trained model will be publicly available.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (61)

Authors (4)

Chao Wang (555 papers)
Ruijie Quan (17 papers)
Yi Yang (855 papers)
HeHe Fan (46 papers)

Citations (8)

View on Semantic Scholar

Tweets

https://twitter.com/SynBio1/status/1884629503491661838

https://twitter.com/JancoD7/status/1774981983669518643

ProtChatGPT: Towards Understanding Proteins with Large Language Models (2402.09649v1)

Related Papers

Tweets