PIGNet: Physics-Informed Graph Neural Network

Updated 9 April 2026

PIGNet is a physics-informed deep learning model that combines explicit atomic interaction modeling with a graph neural network to predict drug-target interactions.
It leverages dual adjacency matrices and layers like Gated Graph Attention and Interaction Networks to generate accurate binding free energy predictions.
Validated on CASF-2016, PIGNet outperforms traditional methods in docking and screening metrics, offering enhanced accuracy and molecular interpretability.

PIGNet is a physics-informed deep learning model that advances the prediction of drug-target interactions (DTIs) by integrating explicit physical modeling of atom-atom interactions within a graph neural network (GNN) architecture. The model directly sums neural-parameterized, physics-based pairwise interactions to predict the binding affinity of protein–ligand complexes, achieving both state-of-the-art accuracy and interpretability. PIGNet was validated on the CASF-2016 benchmark, demonstrating superior docking and screening performance relative to established methods (Moon et al., 2020).

1. Architecture and Structural Representation

PIGNet operates on static 3D protein–ligand complex structures, typically obtained from the Protein Data Bank (PDB) or molecular docking procedures. The model constructs a graph representation as follows:

Nodes: All heavy-atom centers within the protein pocket (within 5 Å of any ligand atom) and all ligand atoms.
Node features: One-hot encoding for atom type (C, N, O, S, etc.), formal charge, hybridization, and aromaticity.
Edges: Two parallel adjacency matrices are defined:
- $A^{\mathrm{cov}}$ : Intra-molecular covalent bonds.
- $A^{\mathrm{inter}}$ : Intermolecular “neighbor” connections between atoms across the protein–ligand interface, restricted to pairs with $d_{ij} < 8$ Å.
Backbone: The architecture alternates Gated Graph Attention (GAT) layers—propagating information via $A^{\mathrm{cov}}$ —with Interaction Network layers that incorporate intermolecular context via $A^{\mathrm{inter}}$ . Typically, $L=3$ –5 such blocks are stacked, yielding final node embeddings $h_i \in \mathbb{R}^d$ (with $d \approx 128$ ).

Atom-atom energy terms are then computed for every protein-ligand atom pair, with outputs from learned MLPs parameterizing physical interaction equations.

2. Physics-Informed Energy Decomposition

PIGNet explicitly models the total predicted binding free energy as: $E^{\mathrm{total}} = \frac{1}{T^{\mathrm{rotor}}} \left[ E^{\mathrm{vdw}} + E^{\mathrm{hbond}} + E^{\mathrm{metal}} + E^{\mathrm{hydrophobic}} \right]$

Rotational Entropy Loss (Rotor Penalty):

$T^{\mathrm{rotor}} = 1 + C_{\mathrm{rotor}} \times N_{\mathrm{rotor}}$

where $A^{\mathrm{inter}}$ 0 is the number of rotatable ligand bonds and $A^{\mathrm{inter}}$ 1 is a learnable parameter.

Interaction Potentials:
- Van der Waals: Parametrized 12–6 Lennard–Jones:
$A^{\mathrm{inter}}$ 2

with $A^{\mathrm{inter}}$ 3, using van der Waals radii and an MLP-learned correction $A^{\mathrm{inter}}$ 4. - Hydrogen bond, metal-ligand, and hydrophobic contributions: Each is evaluated using a learnable “square-well” formulation:

$A^{\mathrm{inter}}$ 5

Parameters ( $A^{\mathrm{inter}}$ 6, $A^{\mathrm{inter}}$ 7, $A^{\mathrm{inter}}$ 8) and term type (hydrogen bond, hydrophobic, metal) are determined via MLPs applied to $A^{\mathrm{inter}}$ 9.

All four energy components are summed over all protein–ligand atomic pairs.

3. Deep Architecture and Learning Workflow

Gated Graph Attention (GAT) blocks: Use self-attention mechanisms over bonded neighbors. Each block incorporates a highway-style gate (small GRU) to control information flow.
Interaction Network blocks: Combine linear embeddings for sender/receiver atoms, aggregate via max-pooling over neighbors, and return messages to nodes through a gating mechanism.
Physics MLP Heads: Each energy term’s coefficient is independently predicted by dedicated MLP heads, typically 2–3 layers, using ReLU activation, and width commensurate with embedding dimensionality.
Final prediction: The total affinity for the complex is computed by summing the physics-informed pairwise terms and adjusting for rotational entropy.

4. Training Objectives and Data Augmentation

PIGNet employs a composite loss constructed to encourage both absolute binding affinity accuracy and robust ranking across native, decoy, and random complexes. Three key data augmentation strategies are used, with ~1.65 million complexes in total:

Docking augmentation: For each ≈2,600 experimental PDBbind-refined complex, ~100 decoy structures are generated and used with a pairwise margin loss ensuring $d_{ij} < 8$ 0.
Random–screening augmentation: ~0.8 million random small molecules from the IBS library are docked per target; constraints enforce that most non-binders have binding free energy $d_{ij} < 8$ 1 kcal/mol.
Cross-screening augmentation: Ligands are docked with random proteins to create non-binding pairs and trained with similar loss constraints.

Native complexes are trained by regression to experimental affinities, while decoys/non-binders are enforced to score less favorably via margin and hinge losses.

5. Benchmark Performance and Comparative Assessment

PIGNet’s effectiveness was assessed on the CASF-2016 benchmark with two principal tasks:

Docking power: Top-1 accuracy for correctly identifying the native binding pose among ~200 decoys.
Screening power: Enrichment factor (EF) at the top 1% of scored ligands, and hit rate for identifying true binders among ~5,000 decoys.

Key outcomes from Table 1, (Moon et al., 2020):

Model	Docking SR (Top 1)	Screening EF (1%)	Screening SR (1%)
AutoDock Vina	84.6%	7.7	29.8%
3D-GNN baseline	66.6%	10.2	28.5%
PIGNet (single model)	85.8%	18.5	50.0%
PIGNet (ensemble, MCDO×30)	87.0%	19.6	55.4%

PIGNet's ensemble achieves 87.0% top-1 docking success, outperforming Vina, and a screening hit rate (55.4%) and enrichment factor (19.6) that are approximately double previous best methods.

6. Interpretability and Insight via Atom-Pair Decomposition

A defining feature of PIGNet is explicit atom-pair attribution. The net energy contribution of any ligand substructure $d_{ij} < 8$ 2 is given by: $d_{ij} < 8$ 3 This permits direct visualization (e.g., heat maps) of which moieties are energetically favorable or unfavorable within the protein context. The model has demonstrated instance-specific interpretability, capturing energetically meaningful responses to substructural modifications, such as an R-group switch between cyclohexyl and tetramethylcyclohexyl in PTPN1 and PAF-AH case studies.

The learned MLPs also adjust van der Waals radii corrections $d_{ij} < 8$ 4 in a chemically realistic manner, distinguishing between, for example, sp²–sp² and sp³–sp³ C–C pairs.

7. Context, Impact, and Relevance

PIGNet establishes a methodological framework bridging graph neural networks and physics-based modeling for drug discovery, combining interpretable additive energy decomposition with high-throughput computational scalability. The model’s explicit pairwise structure links predictive outcomes to underlying chemical interactions, affording detailed rationalization that is amenable to medicinal chemistry optimization. The data-driven parameterization within a physically-motivated framework directly addresses the generalization failures of black-box DNN screening functions, setting a new benchmark for DTI prediction in a highly competitive environment (Moon et al., 2020).

Markdown Report Issue Upgrade to Chat

References (1)

PIGNet: A physics-informed deep learning model toward generalized drug-target interaction predictions (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PIGNet.