KERMT: Kinetic GROVER Multi-Task Model

Updated 16 October 2025

The paper introduces a graph neural network that uses low-cost SAAO features and multi-head AO attention to accurately predict molecular properties and electronic structures.
It employs a multi-task learning strategy to jointly predict chemical endpoints and auxiliary targets, enhancing model generalizability on large molecular datasets.
KERMT delivers state-of-the-art performance with significant computational efficiency, enabling rapid geometry optimization and scalable drug discovery workflows.

The Kinetic GROVER Multi-Task (KERMT) model is an advanced graph neural network framework designed for chemical property prediction and electronic structure learning. Emerging from enhancements to the GROVER and OrbNet families, KERMT integrates chemical domain-specific knowledge, multi-task learning, and efficient computational strategies. It is notable for its strong performance in predicting small molecule properties, electronic energies, and enabling large-scale drug discovery workflows.

1. Graph Neural Network Architecture and Chemical Representation

KERMT employs a graph neural network (GNN) to model molecules where atoms are nodes and bonds are edges. The architecture uses input features derived from a low-cost tight-binding calculation in the symmetry-adapted atomic orbital (SAAO) basis, incorporating operators such as the Fock (𝑭), density (𝑷), centroid distances (𝑫), core Hamiltonian (𝑯), and overlap (𝑺) matrices (Qiao et al., 2020). Energy prediction follows the formula: $E_{\text{out}} = E_{\text{TB}} + \sum_{A} [\text{Dec}(\mathbf{f}_A^L) + E_A^c]$ where $\mathbf{f}_A^L$ denotes the atom-specific embedding after message passing. The GNN performs multi-head "AO–AO attention" message passing and "AO–atom attention" pooling to generate atom-level and molecule-level representations. Residual block decoders separately output total property corrections and auxiliary targets.

In drug property prediction contexts, KERMT incorporates k-hop subgraphs to effectively capture local chemical environments during pretraining on extensive molecular corpora (11 million compounds) (Adrian et al., 14 Oct 2025).

2. Multi-Task Learning Strategy

A central feature of KERMT is its multi-task learning approach. The model is trained to jointly predict multiple chemical endpoints (such as various ADMET properties) and atom-specific auxiliary targets. Multi-task finetuning is conducted by replacing the single-task output layer with a vector-valued feed-forward network (FFN) head, enabling simultaneous prediction for $n$ endpoints or assays: $L = \frac{1}{N} \sum_k I_k \cdot (y_k - \hat{y}_k)^2$ where $I_k$ is an indicator for label availability (Adrian et al., 14 Oct 2025). In the electronic structure setting, auxiliary targets $d_A$ are derived from density matrix projections onto a basis. The loss combines standard prediction and auxiliary terms (with adaptive weighting via GradNorm (Qiao et al., 2020)), encouraging physically meaningful representations. This induces inductive transfer and regularization, especially beneficial when endpoints are correlated or training data is abundant.

3. Analytical Gradients and End-to-End Differentiability

KERMT is fully end-to-end differentiable due to analytic derivations of gradients for molecular energies and forces: $\frac{dE_{\text{out}}}{dx} = \frac{dE_{\text{TB}}}{dx} + \sum_{f \in \{F, D, P, S, H\}} \frac{\partial E_{\text{NN}}}{\partial f} \frac{\partial f}{\partial x} + \operatorname{Tr}\left[W \frac{\partial S^{(\text{AO})}}{\partial x}\right] + \operatorname{Tr}\left[z \frac{\partial F^{(\text{AO})}}{\partial x}\right]$ Automatic differentiation is applied to neural modules and explicit analytic gradients to quantum mechanical operators, maintaining differentiability for geometry optimization and force evaluation tasks (Qiao et al., 2020). This permits structure relaxation and optimization workflows using KERMT energies and forces.

4. Domain-Specific Feature Engineering

KERMT’s generalizability arises from domain-specific features rooted in quantum chemistry. SAAO-based inputs encapsulate orbital energies, densities, and are constructed to be invariant to system size and symmetry. Pretraining on broad compound libraries enables transfer learning across chemical space, and feature extraction is accelerated using the cuik-molmaker package for on-the-fly batched computation of atom, bond, and molecule-level descriptors (Adrian et al., 14 Oct 2025).

5. Performance Benchmarks and Data Splitting

KERMT achieves state-of-the-art results in both electronic structure calculation and property prediction. For molecular formation energies on QM9, the multi-task OrbNet backbone (KERMT) reduces mean absolute error (MAE) from 5.01 meV (previous OrbNet) to 3.87 meV at 110K training examples (Qiao et al., 2020). For ADMET endpoints and potency, multitask finetuned KERMT outperforms Chemprop and Knowledge-guided Pre-training of Graph Transformer (KPGT), particularly for tasks with more than 10K samples, e.g., improving Pearson’s $r^2$ from 0.641 to 0.712 on large assays (Adrian et al., 14 Oct 2025).

Benchmarking employs temporal splits (80-20, latest molecules for test) on internal Merck data, as well as cluster-based PCA-reduced stratification to prevent chemical overlap in public datasets. Two multitask ADMET splits are published for benchmarking methodological advances (Adrian et al., 14 Oct 2025).

6. Computational Efficiency and Industrial Acceleration

KERMT’s computational cost is dramatically reduced compared to conventional quantum chemistry methods. Geometry optimization steps require less than 1 second per evaluation on a CPU core, compared to tens to thousands of seconds for density functional theory (DFT)—a difference of more than three orders of magnitude (Qiao et al., 2020). Industrial-scale large dataset handling is enabled by distributed pretraining (via PyTorch Distributed Data Parallel), which achieves up to 86% scaling efficiency with near-linear speedup on 8 NVIDIA A100 GPUs, and featurization acceleration with cuik-molmaker, which doubles finetuning speed and nearly triples inference throughput while lowering CPU memory usage by ~34% (Adrian et al., 14 Oct 2025).

7. Impact, Applications, and Availability

KERMT is used for accurate prediction of molecular energies, forces, drug-relevant endpoints, and geometry optimization across chemical space. Its multitask capabilities and pretraining make it suitable for high-throughput industrial drug discovery pipelines, where frequent model retraining and inference on large corpora are required. The model, along with its acceleration and distributed-training framework, is publicly available on GitHub (https://github.com/NVIDIA-Digital-Bio/KERMT) and supported by the cuik-molmaker featurization engine (Adrian et al., 14 Oct 2025). This adoption enables reproducible performance benchmarking and persistent improvement of chemical deep learning methodologies.