A Systematic Study of Joint Representation Learning on Protein Sequences and Structures (2303.06275v2)

Published 11 Mar 2023 in q-bio.QM and cs.LG

Abstract: Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein functions. Recent sequence representation learning methods based on Protein LLMs (PLMs) excel in sequence-based tasks, but their direct adaptation to tasks involving protein structures remains a challenge. In contrast, structure-based methods leverage 3D structural information with graph neural networks and geometric pre-training methods show potential in function prediction tasks, but still suffers from the limited number of available structures. To bridge this gap, our study undertakes a comprehensive exploration of joint protein representation learning by integrating a state-of-the-art PLM (ESM-2) with distinct structure encoders (GVP, GearNet, CDConv). We introduce three representation fusion strategies and explore different pre-training techniques. Our method achieves significant improvements over existing sequence- and structure-based methods, setting new state-of-the-art for function annotation. This study underscores several important design choices for fusing protein sequence and structure information. Our implementation is available at https://github.com/DeepGraphLearning/ESM-GearNet.

References (57)

Authors (7)

Zuobai Zhang (24 papers)
Chuanrui Wang (4 papers)
Minghao Xu (25 papers)
Vijil Chenthamarakshan (36 papers)
Aurélie Lozano (20 papers)
Payel Das (104 papers)
Jian Tang (327 papers)

Citations (20)

View on Semantic Scholar

Summary

Formatting Guidelines for Anonymous Submissions to AAAI Proceedings

The document under review is a comprehensive guide for authors preparing anonymous submissions to AAAI (Association for the Advancement of Artificial Intelligence) Press proceedings, technical reports, and other publications. It standardizes the formatting requirements needed to ensure uniformity and professionalism in AAAI submissions using \LaTeX{}. Below, we summarize the key components of the paper along with its implications for authors.

Overview of Submission and Formatting Criteria

This paper details essential formatting guidelines authors must follow to prepare their documents for anonymous submission. The guidelines are structured to maintain the integrity and consistency of presented work, ensuring all submissions adhere to a standardized appearance. Key requirements include:

Anonymity: As anonymous submissions require author identity concealment, authors are instructed to list "Anonymous Submission" as the author and clear metadata from their PDF files.
Conformance with Style File: Authors must adopt the AAAI style file that handles document layout, font, and size constraints automatically. Authors are prohibited from modifying the style file or employing commands that alter document appearance decisions such as spacing, heading formats, etc.
Technical Specifications: Detailed itemization on paper size, column widths, margins, and the prohibition of certain style files or packages that may inadvertent formatting changes or layout disturbances is provided.

Detailed Sections and Instructions

The paper breaks down the formatting into several detailed sections, encompassing every aspect of paper preparation, including sections on copyrights, illustrations, and acceptable font choices. Notably:

Electronic File Requirements: Submissions must be in US letter size, formatted in two columns, with all fonts embedded including figures.
Metadata and Bibliography: The document recommends the use of the BibTeX system with a specified style for references, ensuring consistency in citation formatting.
Illustration Specifications: It elaborates on image and figure standards, disallowing certain file types and mandating quality resolutions to ensure readability in published articles.

Technological Considerations and Constraints

There is an emphasis on compatibility and technical constraints authors might encounter using \LaTeX{}. For example, the paper guards against using Type 3 fonts due to compatibility limitations, advising instead the use of more robust Type 1 fonts when necessary. Authors are forewarned about illegibility and device compatibility issues dealing with images, further advising on the use of proper graphic programs to prepare images outside \LaTeX{}.

Implications and Future Considerations

The guidelines presented in this document are critical for authors in achieving the required standardization in AAAI publications, significantly easing the production line and maintaining uniformity across publications. This standardization not only advances efficiency within AAAI but theoretically aids recognition and readability universally.

As academic publications move towards more automated submissions and formatting checks, the insights within this document guide how authors can adapt their practices to meet future technological demands and policies in AI publications. Although \LaTeX{} remains the primary tool, authors might anticipate evolving standards encompassing more advanced formatting software and potentially automated correction tools to support or replace manual proofing tasks currently required pre-publication.

In summary, this paper sets the benchmark for preparing AAAI submissions, with an eye towards efficient, readable, and consistent documentation practices across AI research dissemination platforms.

PDF Markdown

GitHub

GitHub - DeepGraphLearning/ESM-GearNet: ESM-GearNet for Protein Structure Representation Learning (https://arxiv.org/abs/2303.06275) (103 stars)