ResAtom-Score: Deep Learning in Affinity Prediction
- ResAtom-Score is a deep learning protein-ligand affinity predictor that integrates a 3D ResNet with CBAM attention to capture detailed spatial–chemical interactions.
- It processes voxelized protein–ligand grids with 18 physicochemical channels and employs data augmentation to ensure accurate binding estimation.
- The model achieves state-of-the-art performance on standard benchmarks and facilitates efficient virtual screening without relying on crystal binding poses.
ResAtom-Score is a deep learning-based protein–ligand affinity prediction function developed to address the need for scalable, structure-informed binding estimation in structure-based drug design. The model integrates a three-dimensional ResNet architecture with a convolutional block attention module (CBAM), ingests voxelized protein–ligand grids with 18 physicochemical channels, and is trained on extensive PDBbind data for small molecule complexes. The network achieves high scoring power on standard binding affinity benchmarks and demonstrates robust performance even in the absence of experimentally determined binding poses, especially when combined with advanced docking/reranking strategies such as ΔVinaRF20. The ResAtom-Score implementation and accompanying ResAtom system are publicly available at https://github.com/wyji001/ResAtom (Wang et al., 2021).
1. Input Representation and Feature Voxelization
ResAtom-Score operates on a fixed cubic grid of 35 Å per side, centered at the ligand’s centroid and discretized to 1 Å voxel resolution, yielding an input tensor of shape . Voxelization incorporates 18 feature channels:
- Protein Atom-Type Maps (channels 1–8): hydrophobic, aromatic, H-bond acceptor, H-bond donor, positive ionizable, negative ionizable, metallic, excluded volume.
- Ligand Atom-Type Maps (channels 9–16): identical set as above, applied to ligand atoms.
- Promolecular Electron Densities (channels 17–18): estimated for both protein and ligand using the Multiwfn “promolecular” method.
Atoms are projected to their respective channels using a Gaussian-like kernel (HTMD formalism), with occupancy thresholded into the 1 Å grid; electron densities are similarly discretized. This comprehensive spatial–chemical encoding enables efficient and detailed representation of interaction environments.
2. Network Architecture
The ResAtom-Score backbone comprises a 3D ResNet-34, preceded by a convolutional “stem” and augmented with CBAM attention. The key architectural details are:
- Stem Layer: 3D convolution with kernels, 64 filters, stride 2, followed by batch normalization and ReLU.
- CBAM Attention:
- Channel Attention: Channel-wise global avg/max pooling of the stem output, both passed through a shared two-layer MLP, summed and passed through a sigmoid to yield .
- Spatial Attention: Concatenated channel-averaged and channel-maxpooled stem outputs, convolved with a kernel, and sigmoid-activated to yield .
- The combined attention-refined feature is .
- Residual Blocks: Four groups (3, 4, 6, 3 blocks) at increasing channel widths (64, 128, 256, 512) with 3D convolutions, batch normalization, ReLU, and identity/projection skip connections.
- Global Average Pooling: Flattens the final feature map to a 512-dimensional vector, input to the regression head.
- Regression Output: Single fully connected layer produces , the predicted binding affinity (log or log ).
3. Training, Data Preparation, and Optimization
ResAtom-Score is trained on PDBbind v2017 (general + refined), post-filtering for quality (exclusion of peptide/covalent/incomplete ligands, oversized proteins), resulting in 15,038 complexes. Duplicates from CASF-2016 and CSAR-HiQ are excluded from the train-validation split (80:20; stratified for uniform affinity distribution).
Molecular processing is performed with RDKit and OpenBabel for protonation, HTMD for grid voxelization, and Multiwfn for electron density calculations. Training employs on-the-fly random rotations and translations via TorchIO to enforce orientation invariance. Adam optimizer (β₁=0.9, β₂=0.999) with learning rate 0.001 and CosineAnnealingLR scheduling is used; batch size is 256. There is no explicit dropout; regularization is provided via batch normalization.
Hyperparameter selection is performed via sweeps over learning rates, optimizers (SGD, Adam), learning rate schedules (StepLR, ExponentialLR, CosineAnnealingLR), and ResNet depths (18, 34). The best validation Pearson’s R (≈0.786) is realized with ResNet-34, Adam(0.001), and CosineAnnealingLR.
4. Scoring Procedure and Evaluation Strategy
At inference, the model output 0 is interpreted as predicted log 1 (2log₁₀ 3), which is directly compared to experimental values without further calibration. Model evaluation utilizes several standard metrics:
- Pearson’s Correlation Coefficient (4)
- Root-Mean-Square Error (RMSE)
- Mean Absolute Error (MAE)
For cases lacking crystal structures, the standard workflow is: (1) generate ~10 ligand poses using AutoDock Vina, (2) rerank using a scoring function (preferably ΔVinaRF20), (3) select the top 3 poses, (4) apply ResAtom-Score (with augmentation via five random rotations per pose), and (5) average the predictions across poses and augmentations.
5. Benchmarking and Comparative Performance
ResAtom-Score demonstrates top-tier affinity prediction in the CASF-2016 benchmark (285 complexes), using an ensemble of 9 independently trained models:
| Model/Scoring Function | Ensemble R | Single R |
|---|---|---|
| ResAtom-Score (9-model) | 0.833 ± 0.018 | |
| AK-Score (20-model) | 0.827 | 0.760 |
| ΔVinaRF20 (single) | 0.816 | |
| X-Score (empirical) | 0.631 |
CSAR-HiQ external test (75 complexes non-overlapping with training) yields for ResAtom-Score: 5, RMSE = 1.73, MAE = 1.42 (outperforming Cyscore and RF-Score).
When predicting affinities without experimental structures, combining ΔVinaRF20 pose selection with ResAtom-Score achieves 6, RMSE=1.35, which nearly matches the performance achieved using crystal poses (7).
6. Implementation, Throughput, and Practical Recommendations
The canonical inference workflow is as follows: 8 Inference costs approximately 10 ms per 18×353 example on a Tesla V100 GPU (≈150 ms per complex with 3 poses × 5 augmentations). CPU inference is 5–10× slower but feasible for moderate-throughput applications. Model parameter size is ≈30 MB.
Critical factors influencing prediction quality include docking pose accuracy and the rigid-receptor assumption; ResAtom-Score does not model large protein flexibility or non-small-molecule complexes without retraining. For large libraries, 2D pruning or precomputation of ligand grids is recommended; further augmentation with diverse decoys, explicit water/metal channels, or graph-based encoders may enhance applicability to novel chemotypes.
7. Limitations and Future Directions
ResAtom-Score’s generalization is currently limited to small-molecule/protein complexes as contained in PDBbind; application to protein–protein or nucleic acid complexes requires retraining on suitable datasets. The rigid-receptor model limits accuracy for cases involving substantive protein conformational changes. Pose sampling using physics-based docking methods remains a limiting factor. Potential future improvements include enlarging the training corpus with decoys, integrating additional chemically relevant features, or developing hybrid representations (e.g., with graph-based ligand encoders) to broaden applicability.
In summary, ResAtom-Score represents a rigorous, empirically validated, and computationally efficient approach to 3D structure-based affinity prediction, setting state-of-the-art performance on widely-adopted benchmarks and facilitating robust virtual screening when paired with advanced docking and pose selection strategies (Wang et al., 2021).