Papers
Topics
Authors
Recent
Search
2000 character limit reached

AttentiveFP: Attentive GNN for Molecular Modeling

Updated 6 February 2026
  • AttentiveFP is a graph neural network architecture designed for molecular property prediction using dual-level (atom and graph) attention.
  • The model applies attention-based message passing and pooling to aggregate chemical features into robust molecular fingerprints.
  • Comparative evaluations show competitive performance with increased training overhead due to its complex two-stage attention mechanism.

AttentiveFP is a graph neural network (GNN) architecture designed for molecular property prediction, employing attention-based message passing at both the atom (node) and molecular (graph) levels. In computational chemistry and machine learning applications to QSAR (quantitative structure-activity relationship) and ADME (absorption, distribution, metabolism, excretion) modeling, AttentiveFP represents each molecule as an undirected graph, leveraging rich chemical descriptors and domain-specific neural attention mechanisms. While conceptually related to Graph Attention Networks (GAT) and Message Passing Neural Networks (MPNN), AttentiveFP is distinct in its two-stage attentive pooling strategy and is typically implemented using computational frameworks such as DGL-LifeSci. In comparative evaluations, AttentiveFP demonstrates performance marginally below GAT and MPNN, with increased training and tuning overhead due to its model complexity (Broccatelli et al., 2021).

1. Molecular Graph Construction and Feature Initialization

AttentiveFP operates on molecular graphs G=(V,E)G = (V, E), where nodes v∈Vv \in V correspond to atoms and edges (u,v)∈E(u,v) \in E to chemical bonds. Atom-level descriptors, initialized as node features hv0h_v^0, are extracted using RDKit and include a wide set of properties: atom degree, atom type (e.g., B, C, N), formal charge, hybridization, implicit valence, aromaticity, chirality, and atomic mass. These features may be categorical or real-valued, encoded in one-hot or continuous formats.

Bond (edge) features euve_{uv} are also extracted via RDKit, encompassing bond type (single/double/triple/aromatic), ring membership, conjugation, and stereochemistry. This rich featurization is essential for accurately capturing local and global molecular structure.

2. Attention-based Message Passing and Node Update Mechanisms

The AttentiveFP model executes TT rounds of attentional message passing at the atom level. At each layer tt, the hidden state hvth_v^t of each atom vv is iteratively updated by aggregating messages from its neighbors u∈N(v)u \in N(v). For each neighbor:

  • The message mu→vm_{u \rightarrow v} is computed as mu→v=MLPmsg([hut∥euv])m_{u\rightarrow v} = \mathrm{MLP}_{\text{msg}}([h_u^t \parallel e_{uv}]).
  • The attention score is calculated: eu→v′=MLPattn([hut∥euv∥hvt])e'_{u\rightarrow v} = \mathrm{MLP}_{\text{attn}}([h_u^t \parallel e_{uv} \parallel h_v^t]).
  • Scalar attention weights αu→v\alpha_{u\rightarrow v} are obtained via a softmax over neighbors: αu→v=softmaxu∈N(v)(eu→v′)\alpha_{u\rightarrow v} = \mathrm{softmax}_{u \in N(v)}(e'_{u\rightarrow v}).
  • Node update occurs through a gating mechanism (GRU): hvt+1=GRU(hvt,∑u∈N(v)αu→vâ‹…mu→v)h_v^{t+1} = \mathrm{GRU}(h_v^t, \sum_{u \in N(v)} \alpha_{u\rightarrow v} \cdot m_{u\rightarrow v}).

This scheme enables differentiated weighting of neighbor contributions, allowing the network to learn chemical context sensitivity.

3. Graph-Level Attentive Pooling and Molecular Fingerprint Computation

After TT rounds of atom-level message passing, AttentiveFP performs graph-level attentive pooling. Each atom is assigned an attention score βv\beta_v by applying a pooling MLP followed by a softmax: βv=softmaxv(MLPpool(hvT))\beta_v = \mathrm{softmax}_v(\mathrm{MLP}_{\text{pool}}(h_v^T)). The molecular fingerprint is then computed as r=∑v∈VβvhvTr = \sum_{v \in V} \beta_v h_v^T. This fingerprint vector rr serves as the learned molecular representation.

Final predictions for molecular properties are generated by feeding rr into one or more multi-layer perceptrons (MLPs) tailored to the target endpoint.

4. Training Protocols, Hyperparameter Selection, and Implementation

The implementation is based on DGL and DGL-LifeSci, inheriting architectural details and normalization schemes from prior work (e.g., Xiong et al., 2020). The training objective is the Smooth L1 loss. The optimizer is Adam with weight decay. Early stopping is triggered by monitoring the validation set's average r2r^2 across tasks.

Hyperparameters are selected via Hyperopt; the search includes:

  • Number of atom-attention layers T∈{2,3,4}T \in \{2, 3, 4\}
  • Hidden dimension D∈{64,128,256}D \in \{64, 128, 256\}
  • Attention heads H∈{1,2,4}H \in \{1, 2, 4\}
  • Learning rate ∈[10−4,10−2]\in [10^{-4}, 10^{-2}]
  • Weight decay ∈[0,10−4]\in [0, 10^{-4}]
  • Dropout ∈[0,0.5]\in [0, 0.5]

Typical runs featured 20 trials per model, with 40 for LogD7.4-ST. Exact final hyperparameters are not enumerated in the referenced benchmark (Broccatelli et al., 2021).

5. Comparative Performance on Large ADME Datasets

Across four in vitro ADME endpoints (LogD7.4, HLM CL_int, KinSol, HH CL_int) and two split strategies (internal time-split, external Roche set), AttentiveFP achieved non-dominant but close performance compared to GCN, GAT, and MPNN. The following table summarizes results (average error, r2r^2, and fraction within 1 log unit error):

Assay GCN GAT MPNN AttentiveFP
LogD7.4 (time) 0.58 / 0.65 / 0.87 0.66 / 0.63 / 0.82 0.61 / 0.62 / 0.84 0.58 / 0.56 / 0.87
LogD7.4 (Roche) 0.62 / 0.65 / 0.81 0.60 / 0.63 / 0.83 0.67 / 0.60 / 0.79 0.68 / 0.61 / 0.77
HLM CL_int (time) 0.39 / 0.44 / 0.94 0.38 / 0.48 / 0.95 0.40 / 0.41 / 0.93 0.47 / 0.53 / 0.89
HLM CL_int (Roche) 0.45 / 0.19 / 0.93 0.42 / 0.24 / 0.95 0.49 / 0.16 / 0.91 0.46 / 0.19 / 0.92
KinSol (time) 0.56 / 0.22 / 0.86 0.49 / 0.26 / 0.89 0.53 / 0.24 / 0.86 0.56 / 0.25 / 0.85
HH CL_int (time) 0.42 / 0.39 / 0.93 0.39 / 0.45 / 0.98 0.37 / 0.41 / 0.96 0.39 / 0.37 / 0.95

On average, AttentiveFP's performance marginally lags GAT and is closely matched by MPNN and ExtraTrees(All), a non-deep-learning baseline, with differences in average error typically ≤0.03\leq 0.03 units.

6. Model Complexity, Robustness, and Practical Considerations

AttentiveFP's two-stage attention mechanism (atom-level and graph-level) increases the number of trainable parameters and computational cost as compared to GAT, which only incorporates attention at the neighbor aggregation stage. This greater complexity can result in slower training times and sensitivity to hyperparameter selection. The referenced benchmark observed that AttentiveFP was "less robust in hyperparameter search" than GAT and recommended GAT as the optimal trade-off for large single-task ADME datasets.

The small performance differences across all high-end models (including AttentiveFP) may be attributable to the predictive accuracy of these models nearing the experimental-error inherent in the underlying assay data. A plausible implication is that further performance gains from architectural modifications may be limited by data fidelity rather than model expressiveness. The AttentiveFP architecture remains illustrative of advanced attentive message passing and readout in molecular GNNs, though in large-scale ADME benchmarks, GAT and MPNN offer slightly better efficiency and accuracy (Broccatelli et al., 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AttentiveFP.