DeepRMSD+Vina: Hybrid Pose Optimization

Updated 9 April 2026

The paper introduces a hybrid docking framework that combines a deep neural network’s RMSD prediction with Vina’s energy scoring for enhanced pose refinement.
It employs an end-to-end differentiable architecture using gradient descent, enabling rapid and precise optimization of ligand binding poses.
Benchmark results show a 5–15% improvement in Top 1 success rates over traditional methods, underscoring its potential in virtual screening and drug discovery.

DeepRMSD+Vina is a fully differentiable framework for ligand pose optimization that hybridizes deep learning-based structural metrics with traditional physics-inspired scoring in molecular docking. By integrating a multi-layer perceptron (DeepRMSD), which predicts root-mean-square deviation (RMSD) to a ligand’s native pose, with the AutoDock Vina intermolecular energy score, this methodology achieves state-of-the-art accuracy in identifying native-like binding conformations. DeepRMSD+Vina emphasizes end-to-end differentiability, enabling the use of gradient descent for rapid and effective pose refinement, and demonstrates substantial gains on established docking power benchmarks (2206.13345).

1. Hybrid Scoring Function

Central to DeepRMSD+Vina is a linear combination of two distinct pose quality measures: the AutoDock Vina intermolecular energy ( $S_{\text{Vina}}(X)$ ) and a deep neural network prediction of RMSD to the experimentally observed binding mode ( $\hat r(X)$ ). Denoting a ligand conformation by Cartesian coordinates $X$ and the native pose as $X_{\text{native}}$ , the combined scoring function is given by:

$S_{\text{total}}(X) = w_1 S_{\text{Vina}}(X) + w_2 \hat r(X)$

where $w_1$ and $w_2$ are scalar weights (empirically, $w_1 = w_2 = 0.5$ yields optimal results on the CASF-2016 docking set). $S_{\text{Vina}}(X)$ represents the approximate binding energy (in kcal/mol, lower is better), while $\hat r(X)$ is the deep network’s prediction of the RMSD (in Å; lower is better) of $\hat r(X)$ 0 with respect to $\hat r(X)$ 1. This hybridization leverages Vina’s robust physics-based scoring and DeepRMSD’s sensitivity to fine-grained pose differences.

2. DeepRMSD Multi-Layer Perceptron Architecture and Training

DeepRMSD is implemented as a fully connected MLP trained to regress pose RMSD from atom-level features. The input representation for each pose $\hat r(X)$ 2 is constructed by summing over all protein–ligand atom-pair features:

For residue–atom (RA) types: 105 categories (20 standard amino acids + “OTH”) × 5 element classes.
For ligand atom types: 7 distinct types (C, N, O, P, S, HAL, DU).
Feature values are computed as

$\hat r(X)$ 3

with $\hat r(X)$ 4 (Coulomb and van der Waals approximations), and interatomic distances $\hat r(X)$ 5 Å. This produces a 1470-dimensional input vector $\hat r(X)$ 6.

The MLP has five hidden layers of decreasing width (1024 → 512 → 256 → 128 → 64 → 1), all with ReLU activations. Training minimizes the mean squared error between predicted $\hat r(X)$ 7 and true RMSD, using SGD ( $\hat r(X)$ 8, batch 32, early stopping on validation loss, no dropout). Implementation is in PyTorch.

3. Differentiable AutoDock Vina Score

The Vina score, originally a conventional scoring function, is made fully differentiable in DeepRMSD+Vina. The intermolecular energy is calculated as:

$\hat r(X)$ 9

where $X$ 0 is the surface-to-surface distance, and all kernel functions involved are analytic. The rigorous translation to PyTorch tensor operations enables automatic differentiation of $X$ 1 with respect to atomic positions, providing the necessary gradients for optimization.

4. End-to-End Pose Optimization Procedure

A ligand pose is parameterized by a vector

$X$ 2

encoding translation, rotation (Euler angles), and all rotatable bond torsions. Given the pose vector $X$ 3, the forward kinematics module $X$ 4 computes Cartesian coordinates $X$ 5. The optimization loss is $X$ 6. Gradient-based updates are performed as:

$X$ 7

using learning rate $X$ 8 (e.g., 0.01), for a maximum of 70 iterations with early stopping if $X$ 9 plateaus. The initial pose is set to the output of AutoDock Vina. A clamping rule is applied: only accept the optimized pose if $X_{\text{native}}$ 0 to prevent regression from near-native solutions.

5. Benchmark Performance: CASF-2016 and Cross-Docking

Docking power is quantified as the Top 1 success rate, i.e., the fraction of systems where the highest-scoring pose is within 2 Å RMSD of the native pose. On the CASF-2016 core set:

AutoDock Vina alone: 90.2%
DeepRMSD alone: ~78%
DeepRMSD+Vina (equal weights): 95.4%

This 5.2% absolute increase sets a new benchmark for the dataset. For redocking and cross-docking (3D-DISCO set), DeepRMSD+Vina with end-to-end optimization achieves a ~15% improvement in Top 1 success relative to vanilla Vina. Improvements are especially pronounced for initial poses in the 1–3 Å RMSD range, with approximately 70% optimized further toward the native conformation.

6. Advantages, Limitations, and Prospects

DeepRMSD+Vina’s primary advantage is the unification of deep learning–based pose discrimination with physically motivated energy scoring, facilitated by full differentiability for gradient-based pose refinement. This enables immediate integration into end-to-end learning and optimization pipelines. Demonstrated gains in docking and cross-docking tasks mark a significant advance for computational drug design.

Limitations include the locality of gradient-based optimization, which may fail to escape local minima; this suggests that global search or stochastic restart strategies could further improve outcomes. Computational cost exceeds that of unadorned Vina, though GPU acceleration mitigates overhead. The generalization of DeepRMSD relies on the diversity of structures represented in the training set; systems with uncommon chemotypes may necessitate fine-tuning.

A plausible implication is that the design of hybrid, fully differentiable scoring functions may underpin future advances in learnable docking workflows, supporting both accuracy and extensibility.

7. Context within Virtual Screening and Drug Discovery

The DeepRMSD+Vina framework demonstrates that hybrid scoring functions, fully differentiable and integrating deep learning with established energy terms, can surpass both approaches used in isolation for practical docking tasks. This architecture supports refinement “in situ,” with immediate relevance for virtual screening campaigns and structure-based drug discovery (2206.13345). Its methodologically principled approach and easily extensible architecture are poised to influence subsequent developments in computational chemistry and machine learning–guided molecular modeling.

Markdown Report Issue Upgrade to Chat

References (1)

A fully differentiable ligand pose optimization framework guided by deep learning and traditional scoring functions (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DeepRMSD+Vina.