A Multi-Modal Contrastive Diffusion Model for Therapeutic Peptide Generation (2312.15665v2)
Abstract: Therapeutic peptides represent a unique class of pharmaceutical agents crucial for the treatment of human diseases. Recently, deep generative models have exhibited remarkable potential for generating therapeutic peptides, but they only utilize sequence or structure information alone, which hinders the performance in generation. In this study, we propose a Multi-Modal Contrastive Diffusion model (MMCD), fusing both sequence and structure modalities in a diffusion framework to co-generate novel peptide sequences and structures. Specifically, MMCD constructs the sequence-modal and structure-modal diffusion models, respectively, and devises a multi-modal contrastive learning strategy with intercontrastive and intra-contrastive in each diffusion timestep, aiming to capture the consistency between two modalities and boost model performance. The inter-contrastive aligns sequences and structures of peptides by maximizing the agreement of their embeddings, while the intra-contrastive differentiates therapeutic and non-therapeutic peptides by maximizing the disagreement of their sequence/structure embeddings simultaneously. The extensive experiments demonstrate that MMCD performs better than other state-of-theart deep generative methods in generating therapeutic peptides across various metrics, including antimicrobial/anticancer score, diversity, and peptide-docking.
- Protein Structure and Sequence Generation with Equivariant Denoising Diffusion Probabilistic Models. arxiv:2205.15019.
- Structured Denoising Diffusion Models in Discrete State-Spaces. In Advances in Neural Information Processing Systems, volume 34, 17981–17993. Curran Associates, Inc.
- A Survey on Generative Diffusion Model. arxiv:2209.02646.
- Machine Learning Designs Non-Hemolytic Antimicrobial Peptides. Chem Sci, 12(26): 9221–9232.
- PyRosetta: A Script-Based Interface for Implementing Molecular Modeling Algorithms Using Rosetta. Bioinformatics, 26(5): 689–691.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, 1597–1607. PMLR.
- Ib-M6 Antimicrobial Peptide: Antibacterial Activity against Clinical Isolates of Escherichia Coli and Molecular Docking. Antibiotics, 9(2): 79.
- Deep Attention Based Variational Autoencoder for Antimicrobial Peptide Discovery.
- Denoising Diffusion Probabilistic Models. In Advances in Neural Information Processing Systems, volume 33, 6840–6851. Curran Associates, Inc.
- A Fresh Look at the Ramachandran Plot and the Occurrence of Standard Structures in Proteins. 1(3-4): 271–283.
- Equivariant Diffusion for Molecule Generation in 3D. In Proceedings of the 39th International Conference on Machine Learning, 8867–8887. PMLR.
- What makes multi-modal learning better than single (provably). Advances in Neural Information Processing Systems, 34: 10944–10956.
- The Survey: Text Generation Models in Deep Learning. J King Saud Univ-com, 34(6, Part A): 2515–2528.
- Current Trends of Bioactive Peptides—New Sources and Therapeutic Effect. Foods, 9(7): 846.
- What Can Machine Learning Do for Antimicrobial Peptides, and What Can Antimicrobial Peptides Do for Machine Learning? Interface Focus, 7(6): 20160153.
- Machine Learning-Enabled Discovery and Design of Membrane-Active Peptides. Bioorgan Med Chem, 26(10): 2708–2718.
- Structural Basis of Lipopolysaccharide Extraction by the LptB2FGC Complex. Nature, 567(7749): 486–490.
- De novo peptide and protein design using generative adversarial networks: an update. Journal of Chemical Information and Modeling, 62(4): 761–774.
- A Text-guided Protein Design Framework. arxiv:2302.04611.
- Design Guidelines for Prompt Engineering Text-to-Image Generative Models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, CHI ’22, 1–23. New York, NY, USA: Association for Computing Machinery. ISBN 978-1-4503-9157-3.
- Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models for Protein Structures.
- Reliable and Accurate Solution to the Induced Fit Docking Problem for Protein–Ligand Binding. J Chem Theory Comput, 17(4): 2630–2639.
- modlAMP: Python for Antimicrobial Peptides. Bioinformatics, 33(17): 2753–2755.
- Recurrent Neural Network Model for Constructive Peptide Design. J Chem Inf Model, 58(2): 472–479.
- Trends in peptide drug discovery. Nature reviews Drug discovery, 20(4): 309–325.
- AMPGAN v2: Machine Learning Guided Design of Antimicrobial Peptides.
- E(n) Equivariant Graph Neural Networks. In Proceedings of the 38th International Conference on Machine Learning, 9323–9332. PMLR.
- Protein Sequence and Structure Co-Design with Equivariant Translation. arxiv:2210.08761.
- Deep Unsupervised Learning Using Nonequilibrium Thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning, 2256–2265. PMLR.
- Generative Modeling by Estimating Gradients of the Data Distribution. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
- Discovering Highly Potent Antimicrobial Peptides with Deep Generative Model HydrAMP. Nat Commun, 14(1): 1453.
- Discovering highly potent antimicrobial peptides with deep generative model HydrAMP. Nature Communications, 14(1): 1453.
- MLACP 2.0: An Updated Machine Learning Tool for Anticancer Peptide Prediction. Comput Struct Biotec, 20: 4473–4480.
- APPTEST Is a Novel Protocol for the Automatic Prediction of Peptide Tertiary Structures. Brief Bioinform, 22(6): bbab308.
- Diffusion Probabilistic Modeling of Protein Backbones in 3D for the Motif-Scaffolding Problem. arxiv:2206.04119.
- Generating Ampicillin-Level Antimicrobial Peptides with Activity-Aware Generative Adversarial Networks. ACS Omega, 5(36): 22847–22851.
- Visualizing data using t-SNE. Journal of machine learning research, 9(11).
- DiGress: Discrete Denoising Diffusion for Graph Generation. arxiv:2209.14734.
- Deep Generative Models for Peptide Design. Digital Discovery, 1(3): 195–208.
- Detecting overfitting of deep generative networks via latent recovery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11273–11282.
- Protein Structure Generation via Folding Diffusion. arxiv:2209.15611.
- Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2748–2760.
- Protein sequence design with deep generative models. Current Opinion in Chemical Biology, 65: 18–27.
- Fast protein structure comparison through effective representation learning with contrastive graph neural networks. PLoS computational biology, 18(3): e1009986.
- Structural and functional analysis of protein. In Bioinformatics, 189–206. Elsevier.
- Accelerating the Discovery of Anticancer Peptides Targeting Lung and Breast Cancers with the Wasserstein Autoencoder Model and PSO Algorithm. Brief Bioinform, 23(5): bbac320.
- Diffusion Models: A Comprehensive Survey of Methods and Applications. arxiv:2209.00796.
- Multimodal contrastive training for visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6995–7004.
- Deep Learning-Based Bioactive Therapeutic Peptide Generation and Screening. J Chem Inf Model, 63(3): 835–845.
- Pre-Training Protein Encoder via Siamese Sequence-Structure Diffusion Trajectory Prediction. arxiv:2301.12068.
- Label Anchored Contrastive Learning for Language Understanding. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1437–1449.
- Weakly Supervised Contrastive Learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 10042–10051.
- Discrete contrastive diffusion for cross-modal and conditional generation. arXiv preprint arXiv:2206.07771.
- Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation. arxiv:2206.07771.