Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
51 tokens/sec
2000 character limit reached

DynamicMPNN in Multi-State Protein Design

Updated 30 July 2025
  • DynamicMPNN is a framework that integrates message passing with dynamic, context-adaptive mechanisms using SE(3)-equivariant GVP layers and order-invariant Deep Set pooling.
  • It employs an autoregressive decoder to jointly optimize sequence design across multiple conformational states, achieving up to 13% improvement in structure-normalized RMSD over ProteinMPNN on benchmark protein pairs.
  • The model’s robust design lends itself to applications in bioswitches, allosteric regulation, and synthetic enzyme engineering, while also outlining clear pathways for enhancing pooling techniques in future research.

DynamicMPNN describes a family of models and algorithms that combine message passing neural networks with explicitly dynamic or context-adaptive mechanisms for processing information on graphs or in motion planning tasks. Notable instantiations of the concept include its use in high-accuracy inference on graphs with cycles, multimodal motion planning, and, most recently, multi-state protein design (Abrudan et al., 29 Jul 2025). This overview focuses on the core principles, mathematical formulations, and applications of DynamicMPNN, with emphasis on its implementation for multi-state protein design.

1. Overview and Theoretical Formulation

DynamicMPNN, in the multi-state protein design context, is an inverse folding model that jointly learns the conditional distribution of sequences YY given an ensemble of backbone conformations X1,...,XmX_1, ..., X_m: p(YX1,...,Xm)p(Y|X_1, ..., X_m). Unlike post hoc aggregation of single-state solutions, DynamicMPNN leverages joint learning across conformational ensembles, enforcing compatibility of designed sequences with all considered states. Central to the approach is an autoregressive factorization:

p(YX1,...,Xm)=i=1np(yiy1,...,yi1;X1,...,Xm)p(Y|X_1, ..., X_m) = \prod_{i=1}^n p(y_i \mid y_1, ..., y_{i-1}; X_1, ..., X_m)

where yiy_i is the amino acid at position ii.

The encoding is performed independently for each conformation using SE(3)-equivariant Geometric Vector Perceptron (GVP) layers, mapping each structure to a shared latent space while preserving geometric and vectorial relationships crucial for protein modeling. Following feature extraction, the representations from each state are merged via order-invariant Deep Set pooling, enabling the design to be agnostic to the order and identity of conformations.

2. Model Architecture and Encoder–Decoder Mechanics

The architecture comprises a multi-state GNN encoder and an autoregressive decoder. Each target backbone (including chains from potential interaction partners) is processed with eight GVP layers. GVPs generalize MLPs by supporting both scalar (e.g. residue types, bond types) and vector (e.g. coordinate displacements) features, and impose SE(3) equivariance, aiding generalization to unseen protein geometries.

After encoding, embeddings for all provided conformational states are pooled using the Deep Set approach—summing or averaging representations so that missing residues (due to structural variability or alignment gaps) are handled by masking and independent featurization before aggregation.

The pooled embedding seeds the autoregressive sequence decoder, which also utilizes GVP layers to iteratively output the amino acid at each sequence position, conditioned on previous residue choices and the joint multi-state encoding.

For the inclusion of interaction partners (i.e., heteromeric complexes), the encoder separately encodes partner structures with occupancy masking for sequences with >70%>70\% sequence identity, suppressing information leakage while enabling allosteric and multivalent design scenarios.

3. Data Preparation and Evaluation Measures

DynamicMPNN was trained on 46,033 conformational pairs sourced from the Protein Data Bank and CoDNaS, representing 75% of CATH superfamilies. Sequence redundancy reduction (\geq95% identity clustering) enables broad generalization across the fold space.

Performance is evaluated using the AlphaFold initial guess (AFIG) metric, which benchmarks the refoldability of the designed sequence via AlphaFold2 structure prediction initialized from target coordinates. Two normalization schemes address intrinsic structural variability:

  • Structure Normalization: Accounts for challenge level, using the maximum structural deviation between any pair of native conformations as denominator.

RMSDstruct(Y,Xk;X)=maxi,jRMSD(Xi,Xj)\text{RMSD}_{\text{struct}}(Y, X_k; X) = \max_{i, j} \operatorname{RMSD}(X_i, X_j)

  • Decoy Normalization: Relates AFIG RMSD for the native versus a non-homologous decoy (TM-score < 0.4) conformation.

RMSDdecoy(Y,Xk;D)=AFIG(Y,Xk)AFIG(Y,D)\text{RMSD}_{\text{decoy}}(Y, X_k; D) = \frac{\text{AFIG}(Y, X_k)}{\text{AFIG}(Y, D)}

Additionally, AlphaFold pLDDT serves as a confidence metric for the predicted folds.

4. Comparative Performance and Empirical Results

DynamicMPNN demonstrates up to 13% improvement over ProteinMPNN (multi-state design strategy) in structure-normalized RMSD on a challenging benchmark of 94 protein pairs, achieving lower Best Paired RMSD (13.43 Å vs. 14.76 Å). Similar trends are observed for pLDDT and decoy-normalized metrics, indicating that sequences produced by DynamicMPNN consistently refold into all target conformations with higher fidelity.

Statistical significance is established via Wilcoxon signed-rank tests (pp < 0.001). Notably, DynamicMPNN maintains this advantage even under conditions where data leakage favors ProteinMPNN (overlap between training set and benchmark sequences).

While DynamicMPNN outperforms in refoldability, ProteinMPNN achieves superior sequence recovery and perplexity; this is attributed to overlap between the ProteinMPNN training set and test cases, a limitation highlighted for future evaluation refinement.

5. Methodological Significance and Applications

DynamicMPNN advances the field by enabling explicit, joint design for proteins with multiple functional conformations—a critical aspect for bioswitches, allosteric regulators, metamorphic proteins, and synthetic enzymes undergoing large-scale conformational changes. Its SE(3)-equivariant, message-passing architecture ensures model robustness across structurally diverse ensembles, while the order-invariant pooling and autoregressive decoding facilitate effective global optimization of sequence compatibility.

Potential future applications span synthetic biology, molecular machines, and de novo protein engineering, where dynamic and multi-state behavior underpins function.

6. Limitations and Prospects for Future Research

Despite improved multi-state refoldability, the absolute AFIG RMSD values remain high, reflecting the intrinsic structural variability of the benchmark set. ProteinMPNN retains an edge in sequence recovery, in part due to data set overlap. The current pooling approach, though effective, could be further refined—more expressive or adaptive aggregation functions may improve learning for larger ensembles or finer granularity of conformational states.

Possible directions include:

  • Training a single-state version of DynamicMPNN for fairer comparison to single-state baselines.
  • Expanding to full conformational clusters instead of pairs, increasing applicability to proteins with greater structural plasticity.
  • Integrating with advanced ensemble modeling techniques to capture broader protein landscape diversity.

DynamicMPNN establishes a new paradigm for protein design under multi-state constraints, leveraging dynamic message passing and SE(3)-equivariant graph neural architectures to address challenges of conformational diversity and sequence design optimality (Abrudan et al., 29 Jul 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)