TorchMD-NET: Quantum ML Potentials

Updated 22 November 2025

TorchMD-NET is an open-source framework for learning-based molecular potentials that integrates equivariant GNN architectures with physics-based priors.
The framework employs advanced attention and message-passing techniques, achieving state-of-the-art accuracy on benchmarks such as QM9, MD17, and ANI datasets.
TorchMD-NET enhances computational efficiency with PyTorch optimizations and CUDA Graphs, delivering 2–10× speedups for scalable molecular dynamics simulations.

TorchMD-NET is an open-source framework for learning-based molecular potential energy surfaces, designed to deliver quantum-accurate energy and force predictions with high computational efficiency. The framework centers on the use of equivariant and invariant graph neural networks (GNNs), notably including the Equivariant Transformer (ET) and, in its 2.0 iteration, the O(3)-equivariant TensorNet, for tasks ranging from small-molecule property prediction to large-scale molecular dynamics (MD) simulations. TorchMD-NET integrates tightly with PyTorch, supports GPU-accelerated computations, and incorporates both learned and physics-based priors, yielding a modular, extensible toolkit for the development of machine learning potentials in computational chemistry and materials science (Thölke et al., 2022, Pelaez et al., 2024).

1. Model Architectures and Equivariance

TorchMD-NET models molecular systems as fully connected graphs whose nodes represent atoms and whose edges encode geometric relationships. The original Equivariant Transformer (ET) architecture introduces scalar node features $x_i \in \mathbb{R}^F$ and vector node features $\vec v_i \in \mathbb{R}^{3 \times F}$ . The scalar features are initialized through two learned embeddings per atom: an intrinsic embedding of the atomic number and a neighborhood aggregation modulated by a filter of radial basis functions (RBFs) on interatomic distances. Edge features incorporate continuous cosine cutoffs and distance-dependent RBF projections.

Attention-based message passing is central to the ET, involving multi-head attention where each head computes distance-weighted dot products of query, key, and value components, further filtered by nonlinear transformations. Output layers update both scalar and vector node features with mechanisms specifically designed to preserve rotational equivariance: all coupling between scalars and vectors occurs via scalar (inner) products or through directionally weighted sums of vector features and normalized displacement vectors. All updates are residual, repeated through $L$ layers.

The O(3)-equivariant TensorNet model, introduced in TorchMD-Net 2.0, extends the message-passing paradigm to rank-2 Cartesian tensor features. Each edge encodes a 3×3 tensor $T_{ij}=r_{ij}\otimes r_{ij}$ , where $r_{ij}=x_j-x_i$ ; message updates exploit learnable multilayer perceptrons (MLPs) and matrix products to propagate information equivariantly under full O(3) symmetry. No spherical harmonics or Clebsch–Gordan decompositions are required, maximizing computational tractability.

A comparison of core architectures is provided below:

Model	Type	Features Used	Equivariance
Equivariant Transformer (ET)	Attention-based	Scalars, 3-vectors	SO(3) equivariant
TensorNet	Message Passing	Scalars, 3-vectors, 3×3 tensors	O(3) equivariant
Invariant GNN	Convolutional	Scalar (distance only)	Invariant

2. Training Protocols and Reference Datasets

TorchMD-NET has been evaluated on standard molecular datasets:

QM9: ≈134k equilibrium geometries of small organics with multiple target properties, including atomization energy $U_0$ , zero-point vibrational energies, dipole moments $\mu$ , and frontier molecular orbital energies. Standard splits with train/validation/test ratios are applied.
MD17: Trajectories of 8 small molecules from ab initio MD, with both energies and atomic forces as targets. Models are trained on 1,000 conformers (50 for validation) and tested on the remainder.
ANI-1/ANI-2x: Datasets of ≈22M off-equilibrium conformers generated by normal-mode sampling, with DFT-calculated single-point energies.

Optimization typically uses Adam with ( $\beta_1=0.9, \beta_2=0.999, \epsilon=10^{-8}$ ), learning rate schedules featuring linear warmup and decay, and batch sizes ranging from 8 to 2,048. Loss functions depend on the task; for combined energy and force fitting (e.g., MD17), the total loss is a weighted sum of the MAE in energies and forces: $\mathcal{L} = 0.2 \frac{1}{N_\mathrm{train}} \sum (E_\mathrm{pred}-E_\mathrm{DFT})^2 + 0.8 \frac{1}{3N_\mathrm{train}} \sum_i \|\vec F_{i,\rm pred}-\vec F_{i,\rm DFT}\|^2$ (Thölke et al., 2022).

3. Computational Efficiency and Software Design

TorchMD-NET 2.0 incorporates major advancements in computational throughput. It leverages PyTorch’s torch.compile for fused kernel execution, yielding 2–3× speedups, and CUDA Graphs to encapsulate the entire potential and force computation into a static graph, resulting in 2–10× acceleration for typical molecular systems. Optimized neighbor search routines (brute-force and cell-list hash & sort), fully implemented in CUDA with support for periodic boundary conditions, support rapid construction of molecular graphs.

The core modeling interface (torchmdnet.models.model.TorchMD_Net) modularizes the representation, output, and prior components. Physical priors—including atomwise reference energies, Coulomb interactions with switching functions, D2 dispersion, and ZBL nuclear repulsion—can be injected, and their derivatives contribute analytically to force computation. The package supports command-line (YAML), Python API, and OpenMM integration via the OpenMM-Torch plugin to enable production-scale MD simulations using learned potentials (Pelaez et al., 2024).

4. Benchmarks: Accuracy and MD Stability

TorchMD-NET models, both ET and TensorNet, achieve or surpass state-of-the-art accuracies on benchmarks:

Dataset	Model	Energy MAE	Forces MAE (if available)
QM9 (U₀, meV)	TensorNet 3L	3.8 ± 0.2	—
QM9 (U₀, meV)	ET (2024)	5.7	—
MD17 (aspirin, kcal/mol)	ET (2024)	0.139	0.232 (kcal/mol/Å)
MD17 (aspirin, 2022)	ET (2022)	0.12	0.25
ANI-1 (eV)	ET (2022)	0.022	—

For MD stability, a TensorNet 2-layer model trained on ANI-2x ran 200 ns vacuum NVT simulations on four out-of-sample druglike molecules (41–47 atoms) with stable root mean square deviations (RMSD) less than 3 Å. Throughput is substantially higher than GFN2-xTB (≈0.05 ns/day) and orders of magnitude faster than ab initio MD with PySCF B3LYP. ET-small (≈1.3 M parameters) processes 50 QM9 molecules in 9.4 ± 3.4 ms ( $\rightarrow$ 0.19 ms/molecule), significantly outperforming PaiNN and DimeNet++ (Pelaez et al., 2024, Thölke et al., 2022).

5. Learned Representations, Attention Analysis, and Physical Insight

Attention weight analysis reveals that the ET architecture learns chemically meaningful interaction patterns. Element-pair attention heatmaps show that networks trained on equilibrium-only data (QM9) focus on C–C and C–O interactions but largely ignore H. Models trained on off-equilibrium data (ANI-1, MD17) assign increased attention to H–C and H–O, reflecting the relevance of hydrogen in vibrational dynamics and non-covalent interactions.

A displaced-atom study demonstrates dataset-dependent sensitivity: for ANI-1/MD17-trained models, perturbing an atom position causes a localized increase in attention to that atom, especially for H and C, but this effect is absent for the equilibrium-trained QM9 model. Visualizations of attention edges in molecules confirm that the representation adapts to the configurational diversity, assigning strong edges to chemically relevant interactions depending on training data (Thölke et al., 2022).

6. Extensibility, Customization, and Integration

TorchMD-NET is designed for extensibility at both the model and workflow levels. New representations can be implemented by subclassing nn.Module with customized forward(z,pos,batch) methods. Physical priors are extendable by implementing BasePrior subclasses. Data ingestion supports YAML, NumPy, HDF5, and custom formats. The full training lifecycle—including data preparation, model instantiation, and training via PyTorch Lightning—can be scripted in Python or handled via CLI.

Integration with OpenMM enables direct use of trained TorchMD-NET models in classical and hybrid MD simulations. Forces and energies predicted by TorchMD-NET can be substituted directly for force-field energies, accelerating the transition from ML prototyping to production molecular simulation (Pelaez et al., 2024).

7. Significance and Outlook

TorchMD-NET demonstrates that equivariant neural networks, when coupled with scalable software design and careful data curation (including off-equilibrium conformers), can reach state-of-the-art accuracy and computational efficiency for quantum property prediction and long-timescale molecular simulations. The introduction of TensorNet and the modular extensibility of 2.0 broaden the applicability from small-molecule property prediction to diverse domains including biophysics and materials science. The ability to infuse physical priors and the tight integration with the MD software ecosystem position TorchMD-NET as a foundational tool for next-generation machine learning potentials (Thölke et al., 2022, Pelaez et al., 2024).

Markdown Upgrade to Chat

References (2)

TorchMD-NET: Equivariant Transformers for Neural Network based Molecular Potentials (2022)

TorchMD-Net 2.0: Fast Neural Network Potentials for Molecular Simulations (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TorchMD-NET.