Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 87 tok/s
Gemini 2.5 Pro 44 tok/s Pro
GPT-5 Medium 35 tok/s
GPT-5 High 38 tok/s Pro
GPT-4o 85 tok/s
GPT OSS 120B 468 tok/s Pro
Kimi K2 203 tok/s Pro
2000 character limit reached

HelixVS: Deep Learning Virtual Screening

Updated 15 August 2025
  • HelixVS is a deep learning–enhanced, structure-based virtual screening platform that integrates classical docking with AI re-scoring for accelerated hit discovery.
  • It employs a multi-stage pipeline that refines ligand poses using both empirical scoring and a deep learning affinity model, significantly improving enrichment factors and screening speed.
  • The platform demonstrates superior performance over traditional tools, validated across diverse targets including protein–protein interactions, with options for public and private deployment.

HelixVS is a deep learning–enhanced, structure-based virtual screening (VS) platform designed for accelerated hit discovery in drug development workflows. The platform integrates a multi-stage pipeline combining classical molecular docking with advanced deep learning–based pose scoring and screening modules, demonstrating significant improvements in enrichment factor (EF), speed, and real-world hit rates relative to classical tools such as Vina. HelixVS has been validated across diverse protein targets, including both traditional binding pockets and challenging protein–protein interaction (PPI) interfaces, resulting in consistently high rates of active compound identification and broad adoption potential.

1. Platform Architecture and Deep Learning Integration

HelixVS implements a multi-stage virtual screening pipeline. The initial stage employs conventional docking (e.g., AutoDock QuickVina 2), providing a rapid generation of ligand–protein binding poses based on classical scoring functions (empirical or force-field–driven estimates of binding free energy, ΔG). Subsequently, these docking-generated poses are re-scored by a deep learning affinity model derived from RTMscore, which has been significantly augmented with co-crystal structural data from the Protein Data Bank (PDB) to enhance the model’s coverage across diverse spatial configurations. This model evaluates not only the binding affinity but also the quality of the pose, enabling a refined sifting of potential actives among the immense chemical search space.

The deep learning backbone processes batches of poses, including multiple isomers and conformers for each ligand, permitting effective utilization of conformational diversity lost in many traditional VS strategies. Augmentation with additional PDB co-crystals enables recognition of rarely encountered interaction geometries and non-canonical pocket environments. This dual-stage integration—physical docking followed by machine-learned affinity—serves to mitigate the false positive rate inherent to empirical scoring schemes and provides a robust filter against decoy molecules.

2. Pose-Scoring and Screening Workflow

Following initial pose generation and affinity re-scoring, HelixVS implements an optional pose-screening or conformation filtering module. This module leverages both learned and rule-based criteria to filter for poses exhibiting desirable interaction patterns, such as hydrogen-bonding to specific residues or occupation of sub-pockets relevant for a given binding mode or functional mechanism. The pipeline thus operates as:

  1. Pose Generation: Docking engine provides candidate ligand–protein poses (Stage 1).
  2. Deep Learning Scoring: Advanced model assigns affinity scores, highlighting the poses most likely to be associated with high physical binding affinity (Stage 2).
  3. Interaction Filtering: Poses are optionally filtered based on spatial constraints or user-specified interaction patterns, enhancing the proportion of true positives with the targeted binding profile (Stage 3).

This modularity allows for tailored virtual screens (e.g., including custom binding mode hypotheses, reaction-driven libraries, or focus on allosteric or PPI interfaces).

3. Screening Performance and Benchmarking

HelixVS exhibits marked improvements in both effective screening power and computational efficiency when benchmarked against classical VS tools. On the DUD-E dataset comprising nearly 100 protein targets, HelixVS demonstrated:

  • A mean enrichment factor (EF) improvement of 2.6× over Vina across all targets
  • At 0.1% screening (top-ranking hits), EF₀.₁% = 44.205 for HelixVS vs EF₀.₁% = 17.065 for Vina
  • At 1% screening, HelixVS similarly outperforms with EF₁% = 26.968 versus Vina’s EF₁% = 10.022
Method EF₀.₁% EF₁% Speed (molecules/day, per core)
Vina 17.065 10.022 ~300
HelixVS 44.205 26.968 ~4000

Screening throughput is improved by an order of magnitude, with HelixVS processing approximately 4000 molecules per day per CPU core, compared to ~300 for Vina. This efficiency results from distributed sorting algorithms and the rapid deep learning pose evaluation, enabling practical screening of multi-million–compound libraries in routine campaigns.

4. Applications in Drug Discovery Pipelines

HelixVS’s efficacy has been demonstrated in multiple experimental screening pipelines:

  • CDK4/6 Dual-Target Inhibitors: Screening a 7.8-million–compound library for the CDK4/6–CCND1 interface led to six compounds (from the top 100) showing >20% inhibition in a BiFC assay.
  • TLR4/MD-2 Antagonists: Out of 200,000 screened molecules, over 100 candidates progressed to experimental validation, with two exhibiting nanomolar inhibitory activity in the SEAP assay.
  • cGAS Inhibitors: Screening 30,000 molecules against the ATP-binding cGAS pocket identified 17 actives (with many <10 μM potency, one in the nM range) via luciferase-based cell assays.
  • NIK Inhibitors: Screening ~10 million compounds using HelixVS identified novel NIK scaffolds, including new hits with μM IC₅₀ in enzymatic assays.

These results correspond to robust experimental hit rates (e.g., >10% active compound rate in wet-lab validation), highlighting HelixVS’s real-world impact for both established and novel druggable targets.

5. Public Availability and User Interface

A free public version of HelixVS is available as an online service (https://paddlehelix.baidu.com/app/drug/helixvs/forecast). The web interface supports automated protein preparation, pocket definition, and parameterization steps. Built-in and user-uploadable compound libraries are accepted, and advanced options—such as interaction constraints and explicit pose filtering—facilitate precise definition of the screening protocol. While the public interface is limited in computational throughput relative to private deployment, it offers accessible VS capabilities for academic and industrial scientists. Private deployment options exist for users requiring high-throughput and secure data integration in proprietary pipelines.

6. Implications, Limitations, and Future Directions

The systematic improvements achieved by HelixVS, both in computational metrics and practical hit discovery, suggest that deep learning–augmented VS can substantially reduce the inherent cost and complexity of early drug discovery. By coupling physics-based docking with pose-aware, data-driven affinity estimation and customizable interaction filtering, HelixVS provides a modular, scalable solution compatible with diverse screening paradigms—including allosteric modulators and PPI disruptors. Key limitations involve the modest computing power available on the public platform and the dependence of deep learning scoring on high-quality structural data for model training. Future directions likely include further model refinement, expansion of binding mode libraries (including covalent and macrocycle binders), and deeper integration with downstream medicinal chemistry and synthetic planning algorithms.