Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

149 tokens/sec

GPT-4o

9 tokens/sec

Gemini 2.5 Pro Pro

47 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

PocketVina: Hybrid Molecular Docking Framework

Updated 1 July 2025

PocketVina is a hybrid molecular docking framework that combines machine learning for pocket prediction with GPU-accelerated systematic search for ligand pose sampling.
Its core multi-pocket approach ensures sampling physically valid ligand poses, addressing challenges with diverse or previously unseen protein targets in drug discovery.
Benchmarks demonstrate PocketVina achieves high physical validity and robust generalization without task-specific training, offering efficient high-throughput virtual screening suitable for large-scale campaigns.

PocketVina is a hybrid, search-based molecular docking framework that combines machine learning–based pocket prediction with GPU-accelerated, systematic multi-pocket exploration for ligand pose sampling. Designed to address the challenges of physically valid ligand binding, particularly on structurally diverse or previously unseen targets, PocketVina achieves robust accuracy and efficiency without the need for task-specific training. Its architecture supports large-scale, high-throughput virtual screening (HTVS) in structure-based drug discovery, where both pose reliability and resource efficiency are paramount.

1. Core Methodology: Multi-Pocket Conditioning

PocketVina’s workflow consists of three key stages: (a) pocket detection, (b) parallel search-based docking into top-ranked pockets, and (c) result aggregation.

Pocket Detection: PocketVina employs P2Rank to analyze the protein’s solvent-accessible surface (SAS), assigning ligandability scores to candidate surface points based on their atomic environment. Points are clustered into potential binding pockets, each scored according to summed, squared ligandability (Equation 2):

$S_{\mathcal{K}} = \sum_{p \in \mathcal{K}} s_p^2$

where $s_p$ is the site-specific ligandability score.

Parallel Docking: The top $N$ pockets per protein structure are each docked independently using QuickVina 2-GPU 2.1, which explores the ligand's conformational space efficiently on GPU hardware. Each pose is represented as a tuple of position, orientation (quaternion), and internal torsions (Equation 3):

$C_i = \{x^i, y^i, z^i, a^i, b^i, c^i, d^i, \psi_1^i, ..., \psi^i_{N_\mathrm{rot}}\}$

The search optimizes the classical Vina interaction and internal energy (Equation 4):

$\mathrm{SF}_{C_i'} = f(C_i') = e_\mathrm{inter} + e_\mathrm{intra}$

Result Aggregation: All sampled poses across pockets are re-ranked by predicted binding affinity or RMSD, with the most plausible (by geometry and physical plausibility) selected per protein–ligand pair.

This multi-pocket conditioning paradigm allows PocketVina to avoid reliance on a single pre-specified site or naive blind search, systematically increasing the likelihood of sampling accurate, physically valid poses.

2. Benchmarking and Performance

PocketVina is evaluated on established docking benchmarks covering a range of target diversity and ligand complexity. Metrics include ligand root-mean-square deviation (RMSD) and the proportion of physically valid conformations as defined by PB-valid (PoseBusters validity: RMSD < 2Å, plus chemical correctness).

Benchmark Highlights

PDBbind2020 (Time-split & Unseen Proteins):
- PB-valid (<2Å) success rate: 50.96% overall; 52.08% for unseen targets.
- Maintains state-of-the-art performance for physical validity, on par or surpassing deep learning baselines for RMSD alone.
DockGen:
- PB-valid (<2Å) success rate: 39.68% (highest among all methods evaluated).
- Assesses generalization to novel pockets without train/test overlap.
Astex Diverse Set:
- PB-valid success rate: 90.58%, near-perfect reproduction of crystallographic poses and geometry.
PoseBusters:
- PB-valid (<2Å) for 65.65% of cases, with sustained accuracy on low sequence-identity targets (<30%), in contrast to steep drop-offs in deep learning approaches.

Flexibility and Ligand Size

PocketVina maintains high PB-valid rates across ligands of varying flexibility:

Rigid ligands (<5 rotatable bonds): PB-valid ≈ 80%
Larger, flexible ligands: Outperforms alternatives, with performance degrading gracefully rather than precipitously.

Speed and Resource Utilization

Processes millions of protein–ligand pairs in days (e.g., ~3 days on 7 × 6GB GPUs for >500,000 complexes).
Resource use for equivalent deep learning models (e.g., CompassDock): 20 × 15GB GPUs for 1.5 months.

3. TargetDock-AI Dataset and Biological Relevance

PocketVina introduces TargetDock-AI, a dataset comprising 563,251 protein–ligand pairs, including over 16,000 activity-annotated complexes pertinent to neuroblastoma targets and approved drugs. This resource benchmarks methods not solely on pose accuracy but on biological discrimination:

Bioactivity Discrimination: PocketVina’s scoring function enables segregation of actives from inactives (p-value $1.34 \times 10^{-82}$ ), surpassing both pose-generation and post-processing–enriched deep learning baselines.
Deep Learning Comparison: CompassDock, even with supplementary energetic and chemical filters, cannot reliably separate actives from decoys, often scoring actives worse than inactives.

This suggests that physically grounded search- and scoring-based methods, as implemented in PocketVina, offer greater reliability on uncharacterized targets compared to current neural generative models.

4. Comparison to Deep Learning Docking Methods

A principal distinction between PocketVina and contemporary deep learning-based docking methods lies in physical plausibility and generalization behavior:

Physical Validity: Analyses with PoseBusters demonstrate that deep learning models often output geometrically plausible but chemically or physically invalid conformations (steric clashes, bond anomalies), especially for unseen proteins or novel scaffolds.
Generalization and Training: PocketVina requires no training or fine-tuning on specific tasks or target classes, maintaining performance when applied to out-of-distribution proteins or ligands.
Resource Constraints: Deep learning methods typically require order-of-magnitude greater GPU memory and compute, making large-scale deployment nontrivial for standard research environments.

A plausible implication is that, barring major advances in neural scoring and constraint satisfaction, search-based approaches like PocketVina remain superior for computationally intensive, real-world virtual screening campaigns.

5. Implications for Structure-Based Drug Discovery

PocketVina delivers several practical benefits for structure-based drug discovery pipelines:

HTVS Suitability: Its speed and robustness position it as a viable engine for primary screens and rescoring steps in chemical library evaluation.
Novel and Uncharacterized Targets: The lack of dependence on prior data ensures applicability to de novo targets, emerging pathogen proteins, and repurposing workflows.
Public Availability: With both code and datasets released, PocketVina is accessible for benchmarking and deployment in academic and industrial contexts.

Enhancing reliability by focusing on both geometric and physical criteria increases the likelihood that in silico hits will translate into experimentally relevant actives.

6. Algorithmic Details and Implementation

For implementation, PocketVina leverages:

P2Rank (Pocket Prediction)
- Ligandability scoring:
$f_p^\phi = \sum_{a \in N_p} \max\left(0,\,1 - \frac{\mathrm{dist}(a,p)}{6.0}\right)\,\phi(a)$
QuickVina 2-GPU 2.1 (Docking Engine)
- Utilizes standard Vina search with augmented GPU parallelism for sampling poses within each pocket prediction zone.
- Conformation search space includes global translation, rigid rotation, and flexible torsions.
Result Aggregation
- Merges all poses, ranks by empirical scoring function, and applies PB-valid post-processing.

Released source code and data repositories are:

7. Summary Table: Key Characteristics and Outcomes

Feature	Description	Performance/Implication
Pocket Discovery	ML-based (P2Rank), multi-pocket, surface-driven	High physical validity, robustness
Docking Strategy	Search-based, GPU-accelerated (QuickVina 2-GPU 2.1)	Fast, scales to millions of docks
Physical Validity	PB-valid (PoseBusters) assessment included	Outperforms in chemical correctness
Resource Requirements	Efficient, runs on modest GPUs	Accessible for typical research use
Biological Discrimination	Validates actives vs. inactives on TargetDock-AI	Success where DL models fail

PocketVina establishes a new benchmark in docking by integrating systematic multi-pocket conditioning, GPU acceleration, and robust pose post-processing, setting a standard of scalable, physically valid docking readily applicable to modern drug discovery efforts.

PDF Markdown Chat (Upgrade)