Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 74 tok/s

Gemini 2.5 Pro 55 tok/s Pro

GPT-5 Medium 19 tok/s Pro

GPT-5 High 24 tok/s Pro

GPT-4o 109 tok/s Pro

Kimi K2 212 tok/s Pro

GPT OSS 120B 464 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

PepMLM: Target Sequence-Conditioned Generation of Therapeutic Peptide Binders via Span Masked Language Modeling (2310.03842v3)

Published 5 Oct 2023 in q-bio.BM

Abstract: Target proteins that lack accessible binding pockets and conformational stability have posed increasing challenges for drug development. Induced proximity strategies, such as PROTACs and molecular glues, have thus gained attention as pharmacological alternatives, but still require small molecule docking at binding pockets for targeted protein degradation. The computational design of protein-based binders presents unique opportunities to access "undruggable" targets, but have often relied on stable 3D structures or structure-influenced latent spaces for effective binder generation. In this work, we introduce PepMLM, a target sequence-conditioned generator of de novo linear peptide binders. By employing a novel span masking strategy that uniquely positions cognate peptide sequences at the C-terminus of target protein sequences, PepMLM fine-tunes the state-of-the-art ESM-2 pLM to fully reconstruct the binder region, achieving low perplexities matching or improving upon validated peptide-protein sequence pairs. After successful in silico benchmarking with AlphaFold-Multimer, outperforming RFDiffusion on structured targets, we experimentally verify PepMLM's efficacy via fusion of model-derived peptides to E3 ubiquitin ligase domains, demonstrating endogenous degradation of emergent viral phosphoproteins and Huntington's disease-driving proteins. In total, PepMLM enables the generative design of candidate binders to any target protein, without the requirement of target structure, empowering downstream therapeutic applications.

Citations (12)

View on Semantic Scholar

Summary

Insights into the Design of Therapeutic Peptide Binders with PepMLM

The development of PepMLM represents a significant contribution to the computational design of therapeutic peptide binders. This paper introduces PepMLM, a novel technique for generating de novo peptide binders using a span masking strategy on the ESM-2 protein LLM (pLM), enhancing our ability to target "undruggable" proteins without reliance on stable 3D structures.

Problem Addressed and Methodology

Traditional methods of drug discovery are limited by their dependence on well-defined binding pockets or stable 3D protein structures, which many pathogenic and disease-driving proteins lack. The paper addresses these limitations by presenting a span masking approach that situates cognate peptides at the C-terminus of target sequences. The PepMLM system fine-tunes the ESM-2 pLM to reconstruct the peptide binding region, demonstrating low perplexity values that often improve on validated sequence pairs.

Significantly, PepMLM bypasses the requirement for structural information, a common dependency in previous methods such as those utilizing AlphaFold or RFDiffusion. PepMLM leverages span masking to train the ESM-2 model, using the masked LLMing (MLM) objective to generate specific peptide binders when conditioned on target protein sequences.

Results and Comparative Performance

The performance of PepMLM is validated through in silico and experimental methods. PepMLM shows competitive perplexity scores, often lower than those from RFDiffusion, and its binders demonstrate high stability and affinity, evidenced by co-folding metrics like ipTM and pLDDT from AlphaFold-Multimer.

The in vitro results highlight PepMLM's efficacy, particularly in generating peptide binders fused to E3 ubiquitin ligase domains, leading to significant degradation of problematic proteins such as those driving Huntington's disease and viral phosphoproteins from dangerous and pandemic-potential viruses. This establishes PepMLM's practical utility in proteome modification.

Implications and Future Directions

PepMLM's approach unlocks new avenues in therapeutic design against conformationally diverse protein targets. By not requiring structural information, it simplifies the generation pathway, accelerating therapeutic research and development timelines. Future iterations of PepMLM could focus on increasing model size and reducing resource costs, improving hit rates, and integrating high-throughput screening for feedback-based model refinement.

Further enhancement could involve retraining the model to account for post-translational modifications and exploring potential stabilizing modifications of the generated peptide binders, increasing their therapeutic viability. Ultimately, the scalability and adaptability of PepMLM suggest a trajectory towards a CRISPR-like, programmable system for protein targeting and therapeutic generation.

In summary, PepMLM marks a significant advancement in the field of therapeutic peptide design, offering a versatile and powerful toolset for addressing the challenge of targeting previously "undruggable" proteins. Its capacity to operate independently of structural data sets it apart from current methodologies, providing a promising direction for future research in drug development and protein engineering.