MolTrans: Molecular Interaction Transformer for Drug Target Interaction Prediction (2004.11424v1)

Published 23 Apr 2020 in q-bio.QM and cs.LG

Abstract: Drug target interaction (DTI) prediction is a foundational task for in silico drug discovery, which is costly and time-consuming due to the need of experimental search over large drug compound space. Recent years have witnessed promising progress for deep learning in DTI predictions. However, the following challenges are still open: (1) the sole data-driven molecular representation learning approaches ignore the sub-structural nature of DTI, thus produce results that are less accurate and difficult to explain; (2) existing methods focus on limited labeled data while ignoring the value of massive unlabelled molecular data. We propose a Molecular Interaction Transformer (MolTrans) to address these limitations via: (1) knowledge inspired sub-structural pattern mining algorithm and interaction modeling module for more accurate and interpretable DTI prediction; (2) an augmented transformer encoder to better extract and capture the semantic relations among substructures extracted from massive unlabeled biomedical data. We evaluate MolTrans on real world data and show it improved DTI prediction performance compared to state-of-the-art baselines.

Authors (4)

Kexin Huang (50 papers)
Cao Xiao (84 papers)
Lucas Glass (17 papers)
Jimeng Sun (181 papers)

Citations (249)

View on Semantic Scholar

Summary

MolTrans: Advancements in Drug-Target Interaction Prediction through Transformers

The paper "MolTrans: Molecular Interaction Transformer for Drug Target Interaction Prediction" presents a novel approach to predicting drug-target interactions (DTIs) using transformer-based models. The overarching goal is to enhance drug discovery processes by addressing existing challenges in DTI predictions related to the reliance on limited labeled datasets and the inadequacy of current molecular representation learning approaches.

Key Contributions

The paper introduces MolTrans, a model designed to overcome two primary challenges in DTI prediction:

Sub-Structural Representation: Unlike most existing methods that rely on whole molecular structures, MolTrans focuses on sub-structural interactions, which are more aligned with how DTIs occur in reality. For this purpose, a Frequent Consecutive Sub-sequence (FCS) mining algorithm was developed to extract relevant sub-structures from massive unlabelled datasets.
Leveraging Unlabelled Data: MolTrans capitalizes on large volumes of unlabelled molecular data to improve the quality of sub-structural fingerprints. This approach significantly enhances predictive performance over methods that rely solely on limited labeled datasets.

The integration of these features leads to more accurate and interpretable DTI predictions by explicitly modeling interactions at the sub-structural level using augmented transformer architectures. The subsequent alignment between predicted results and known pharmacological interactions underpins the model's potential utility for virtual drug screening and repositioning.

Experimental Validation

The paper provides comprehensive evaluations of MolTrans. Key findings demonstrated through empirical studies are:

Performance: MolTrans consistently outperforms state-of-the-art DTI prediction models across various metrics, including ROC-AUC and PR-AUC. This indicates its robustness and efficacy in predicting interactions more accurately.
Generalization: It exhibits competitive generalization capabilities in scenarios where drugs or protein targets are unseen during training. This characteristic is critical for practical applications in drug discovery pipelines where novel or less-studied compounds are common.
Data Scarcity: Demonstrated resilience even in scenarios with high fractions of missing data. This capability suggests that MolTrans could be especially beneficial in early-stage drug research with limited experimental data.

Theoretical Implications

The introduction of a knowledge-inspired, transformer-based model for DTI prediction encourages a shift in how unlabelled biomedical data is utilized. By effectively incorporating vast untapped datasets, MolTrans not only improves prediction accuracy but also increases the interpretability of results, thus aligning computational predictions closer to biological and chemical insights about drug-action mechanisms.

Practical Implications and Future Directions

The practical implications of this research are multifaceted. The enhanced interpretability facilitates better decision-making in medicinal chemistry, while the improvement in DTI prediction accuracy supports more efficient drug discovery cycles. The methodology established by MolTrans might be extended to other related domains within bioinformatics and cheminformatics.

Moving forward, further development could focus on refining sub-structure extraction techniques, optimizing transformer architectures for specific DTI contexts, and extending the model to encompass broader classes of molecular interactions beyond standard DTIs. Additionally, an exploration of how MolTrans can be integrated into existing drug discovery workflows to enhance throughput and cost-efficiency remains an exciting avenue for future exploration.

Overall, the MolTrans framework represents a significant advancement in molecular interaction prediction, establishing a foundation for more data-driven and insightful drug discovery practices.

PDF Markdown

Related Papers

Find Related Papers