One Transformer Can Understand Both 2D & 3D Molecular Data (2210.01765v4)

Published 4 Oct 2022 in cs.LG, q-bio.BM, and stat.ML

Abstract: Unlike vision and language data which usually has a unique format, molecules can naturally be characterized using different chemical formulations. One can view a molecule as a 2D graph or define it as a collection of atoms located in a 3D space. For molecular representation learning, most previous works designed neural networks only for a particular data format, making the learned models likely to fail for other data formats. We believe a general-purpose neural network model for chemistry should be able to handle molecular tasks across data modalities. To achieve this goal, in this work, we develop a novel Transformer-based Molecular model called Transformer-M, which can take molecular data of 2D or 3D formats as input and generate meaningful semantic representations. Using the standard Transformer as the backbone architecture, Transformer-M develops two separated channels to encode 2D and 3D structural information and incorporate them with the atom features in the network modules. When the input data is in a particular format, the corresponding channel will be activated, and the other will be disabled. By training on 2D and 3D molecular data with properly designed supervised signals, Transformer-M automatically learns to leverage knowledge from different data modalities and correctly capture the representations. We conducted extensive experiments for Transformer-M. All empirical results show that Transformer-M can simultaneously achieve strong performance on 2D and 3D tasks, suggesting its broad applicability. The code and models will be made publicly available at https://github.com/lsj2408/Transformer-M.

PDF Abstract

Overview of Transformer-M for Molecular Representation Learning

The paper "One Transformer Can Understand Both 2D and 3D Molecular Data" presents a novel approach to molecular representation learning using a unified Transformer-based model called Transformer-M. This model is designed to process both 2D graph structures and 3D geometric structures of molecular data, addressing the prevalent issue in molecular modeling where different neural architecture designs are often needed for different data modalities. Here, we provide an in-depth examination of the methodology, experimental results, and potential implications of this research.

Model Architecture

Transformer-M leverages the versatility of the Transformer architecture by implementing two distinct channels within its framework to encode the structural nuances of 2D and 3D molecular data. The 2D channel features degree encoding, shortest path distance encoding, and edge encoding, focusing on the spatial relations and bond features inherent in molecular graphs. In parallel, the 3D channel utilizes Euclidean distance encoding to capture spatial relationships in 3D molecular structures. These channels are integrated into the attention mechanism of a standard Transformer, allowing for simultaneous input and processing from either data modality, as dictated by the structure of the input.

Training Strategy

A noteworthy aspect of Transformer-M is its joint training strategy across 2D and 3D data. The network parameters are shared across tasks, with specific structural channels activated according to the input data format. This parameter-sharing and flexible activation contribute to a robust learning process, enhancing the model's ability to generalize across formats. The training process employs both supervised tasks like energy prediction from the PCQM4Mv2 dataset and self-supervised tasks such as 3D position denoising, supporting fine-tuning for different molecular tasks.

Experimental Results

The empirical evaluation of Transformer-M highlights its efficacy. On the PCQM4Mv2 dataset, which only contains 2D molecular graphs, the model achieved a significant reduction in mean absolute error (MAE), outperforming existing baselines like Graphormer and GRPE. This indicates that the model's architecture and joint training effectively mitigate overfitting and leverage complementary information across data formats. Similarly, when applied to the PDBBind dataset, encompassing both 2D and 3D data, Transformer-M demonstrates superior predictive capabilities across all tested metrics, further confirming its adaptability for varying molecular complexity. Although not achieving state-of-the-art across all metrics on the QM9 dataset, the competitive performance on quantum chemical property prediction tasks showcases the model's strength in handling complex 3D molecular data.

Implications and Future Directions

The development of Transformer-M reflects a broader ambition to create versatile and general-purpose models in molecular science. By enabling a single model to process and learn from both 2D and 3D molecular data, the research advances the potential for applications across domains like drug discovery, material science, and computational chemistry. Future work can explore integrating more sophisticated encoding strategies, employing additional self-supervised pre-training tasks, and extending the model capabilities towards larger and more diverse molecular databases. Moreover, the investigation of diverse molecular representations and the robustness of results suggests directions for potential improvements in molecular property prediction accuracy, especially in datasets beyond those tested in this paper.

Ultimately, this work advances the conversation on how neural architectures can be optimally designed to overcome modality limitations, pushing the boundaries of what's feasible in molecular modeling using unified architectures. The presentation of Transformer-M is a significant stride in leveraging the flexibility and capacity of Transformers for comprehensive molecular data processing and understanding.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Shengjie Luo (20 papers)
Tianlang Chen (24 papers)
Yixian Xu (4 papers)
Shuxin Zheng (32 papers)
Tie-Yan Liu (242 papers)
Liwei Wang (239 papers)
Di He (108 papers)

Citations (83)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - lsj2408/Transformer-M: [ICLR 2023] One Transformer Can Understand Both 2D & 3D Molecular Data (official implementation) (213 stars)