Papers
Topics
Authors
Recent
2000 character limit reached

Single-Sequence-Based Protein Secondary Structure Prediction using One-Hot and Chemical Encodings of Amino Acids (2407.05173v1)

Published 6 Jul 2024 in q-bio.BM

Abstract: In protein secondary structure prediction, each amino acid in sequence is typically treated as a distinct category and represented by a one-hot vector. In this study, we developed two novel chemical representations for amino acids utilizing molecular fingerprints and the dimensionality reduction algorithm FastMap. We demonstrate that the two new chemical encodings can provide additional information about the interactions of amino acids in sequences that an LSTM-based model cannot capture with one-hot encoding alone. Compared to the latest LSTM-based model used in the single-sequence-based method SPOT-1D-Single, our ensemble model utilizing one-hot and chemical encodings achieves better accuracy across most test sets while requiring approximately nine times fewer trainable parameters for each encoding model. Our single-sequence-based method is valuable for its simplicity, lower resource requirements, and independence from external sequence data. It is beneficial when quick or preliminary predictions are needed or when data on homologous sequences is scarce.

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.