Papers
Topics
Authors
Recent
Search
2000 character limit reached

ML-PES (CASPT2/aug-cc-pVTZ) for H₂COO Dynamics

Updated 24 January 2026
  • The paper presents a high-fidelity machine learned potential energy surface built on CASPT2/aug-cc-pVTZ data that accurately captures H₂COO reaction pathways.
  • The model employs PhysNet neural networks and extensive adaptive sampling to generate a dataset covering minima, transition states, and dissociation channels with errors within 1–2 kcal/mol.
  • Molecular dynamics simulations using the ML-PES reveal precise product branching ratios and nonstatistical behavior, offering actionable insights for computational and atmospheric chemistry.

A machine learned potential energy surface (ML-PES) trained to CASPT2/aug-cc-pVTZ reference data represents a high-fidelity, global multidimensional fit of electronic energies and forces using neural network (NN) architectures, with all training and validation conducted against “gold standard” multireference quantum chemistry. In the context of the H₂COO (Criegee intermediate) reaction system, this permits quantitative probing of all relevant reaction channels, branching ratios, and nonstatistical fate under atmospheric conditions using molecular dynamics (MD) simulations on a computationally tractable, subchemical-accuracy surface (Yin et al., 25 Jul 2025, Song et al., 2024, Yin et al., 17 Jan 2026).

1. Quantum Chemical Reference: CASPT2/aug-cc-pVTZ

The reference electronic structure data underlying these ML-PESs are computed using complete active space second-order perturbation theory (CASPT2) in conjunction with the augmented correlation-consistent triple-zeta basis set (aug-cc-pVTZ) as implemented in MOLPRO, with a CASSCF(12,11) active space. This protocol captures strong nondynamical correlation and provides robust energetics across minima, transition states (TS), and dissociation channels (Yin et al., 25 Jul 2025, Yin et al., 17 Jan 2026). All stationary points, reaction paths, and asymptotic product energies—covering >150 kcal/mol—are computed at this level, which avoids both single-reference bias and significant basis set incompleteness.

2. Data Generation and Coverage

ML-PES construction begins from large-scale sampling of relevant molecular configurations. For H₂COO, the dataset has evolved as follows:

  • Initial points: 5 162 geometries covering the parent–cyclic intermediate (cyc-H₂CO₂) region, harvested via adaptive sampling and normal-mode displacements (Yin et al., 25 Jul 2025).
  • Extended sampling: MD at the semiempirical GFN2-xTB level (via ASE), systematic perturbations along IRCs, and active learning to target product and TS regions, adding ~9 200 points, with coverage up to ≈280 kcal/mol above the reactant (Yin et al., 25 Jul 2025).
  • Final reference set: After outlier rejection (EPESEref>10|E_\mathrm{PES} - E_\mathrm{ref}| > 10 kcal/mol), 13 877 geometries for PES2025 (Yin et al., 25 Jul 2025), expanded by an additional 1 053 product-region points in subsequent work to reach a total of 14 930 distinct CASPT2/aug-cc-pVTZ structures (Yin et al., 17 Jan 2026).
  • Data splits: 80% training, 10% validation, 10% test (≈12 000/1 500/1 500 for the largest set) (Yin et al., 17 Jan 2026).

This dataset spans all accessible minima, TSs, intermediate structures, and dissociation limits across all known decomposition channels, ensuring full-dimensional global coverage (Yin et al., 17 Jan 2026).

3. Machine Learning Architecture: PhysNet Representation

PhysNet, a continuous-filter convolutional neural network (NN), forms the basis of all recent CASPT2/aug-cc-pVTZ H₂COO ML-PES implementations. The model utilizes only interatomic distances as input, expanded into Gaussian radial-basis descriptors:

gk(rij)=exp[(rijμk)2/σ2],g_k(r_{ij}) = \exp[-(r_{ij} - \mu_k)^2/\sigma^2],

with centers μk\mu_k spaced every 0.1 Å and width σ=0.5\sigma=0.5 Å (Song et al., 2024). Atom-centered learnable embeddings are updated through 3–5 message-passing blocks, yielding atomic energies EiE_i and additive molecular total energy (Yin et al., 25 Jul 2025, Song et al., 2024, Yin et al., 17 Jan 2026). Locality is enforced via distance cutoffs (5–10 Å), and permutational invariance is achieved by design. Forces and, in some variants, partial charges and dipoles, are obtained via analytic differentiation.

4. Model Training, Validation, and Metrics

Training involves minimization of a composite loss function including energy, force, and (optionally) charge and dipole errors, with application-specific weighting:

L=wEEEref+wF3Ni,αEri,αFi,αref+wQiqiQref+wp3αiqiri,αpαref+LnhL = w_E |E - E^\mathrm{ref}| + \frac{w_F}{3N} \sum_{i,\alpha} |-\frac{\partial E}{\partial r_{i,\alpha}} - F_{i,\alpha}^\mathrm{ref}| + w_Q |\sum_i q_i - Q^\mathrm{ref}| + \frac{w_p}{3}\sum_\alpha |\sum_i q_i r_{i,\alpha} - p_\alpha^\mathrm{ref}| + L_\text{nh}

(Yin et al., 25 Jul 2025). Optimization typically uses Adam or AMSGrad, with early stopping on plateaus in validation RMSE and regularization via non-harmonic terms but no explicit L₂ penalty (Yin et al., 25 Jul 2025, Song et al., 2024).

Model performance is quantified as follows:

Data Split MAE (E, kcal/mol) RMSE (E, kcal/mol) MAE (F, kcal/mol/Å) RMSE (F, kcal/mol/Å)
training (11,100) 1.37 2.06 0.51 2.71 0.9982
test (1,400–1,500) 1.23–1.33 2.00–2.14 1.01–1.11 3.08–3.80 0.9983–0.9984

No spurious “holes” or unphysical artifacts are detected in the asymptotic regions, as verified by large-scale Diffusion Monte Carlo sampling (2.1×10⁹ geometries) (Yin et al., 25 Jul 2025, Yin et al., 17 Jan 2026). All minimum energy paths (MEPs), IRCs, and stationary points are found to lie within 1–2 kcal/mol of the underlying CASPT2 reference surface (Yin et al., 25 Jul 2025, Yin et al., 17 Jan 2026).

5. Dynamical Applications: MD and Branching Analysis

The ML-PES enables large-scale direct classical MD, interfaced via CHARMM/pyCHARMM and evaluated on-the-fly with automatic differentiation (Yin et al., 25 Jul 2025, Song et al., 2024). Simulations employ pure NVE ensembles, 0.1 fs timesteps, and flexible bonds, starting from equilibrated 300 K ensembles or specifically vibrationally-mode-excited initial conditions (Yin et al., 25 Jul 2025, Song et al., 2024).

  • Product branching ratios (1 ns, 5 000 runs): CO₂+H₂: 32.3%; H₂O+CO: 19.3%; HCO+OH: 1.6%; unreacted H₂COO: 43.7% (Yin et al., 25 Jul 2025).
  • Channel-dependent dynamic bifurcation is observed for CO₂+H₂: both “direct” (via OCH₂O) and “indirect” (via formic acid) routes are accessible and quantitatively reproduced by the PES, with energy partitioning and vibrational state populations reflecting pathway distinctness (Yin et al., 25 Jul 2025, Yin et al., 17 Jan 2026).
  • Lifetimes in highly excited formic acid intermediates exhibit non-RRKM stretched-exponential statistics (β\beta = 1.1–1.7), directly attributable to nonstatistical dynamics and confirmed by the robustness of the ML-PES over long trajectory propagation (Yin et al., 17 Jan 2026).

Mode-selective excitation along normal coordinates enables controlled access to specific regions of the surface: pure νCH\nu_{CH} excitation drives H-transfer and formic acid formation, while excitation of CH and COO-bend modes accesses dioxirane and subsequent OH-elimination; dynamical outcomes are correctly gated as a function of internal energy and mode composition (Song et al., 2024).

6. Assessment of PES Robustness, Limitations, and Transferability

Rigorous validation demonstrates the ML-PES:

  • Reproduces CASPT2/aug-cc-pVTZ stationary points, transition-state energies, and dissociation curves along all minimum-energy and multidimensional cuts to within 1–2 kcal/mol (Yin et al., 25 Jul 2025, Yin et al., 17 Jan 2026).
  • Yields smooth one- and two-dimensional potential slices and no spurious artifacts at large atom separations (up to ≈5 Å), with correct asymptotic approaches to isolated product fragments (Yin et al., 17 Jan 2026).
  • Supports both near-equilibrium and strongly dissociative dynamics up to ~150 kcal/mol above ground, avoiding extrapolation failures by ensuring adequate high-energy data and targeted product-channel sampling (Yin et al., 17 Jan 2026).
  • Correctly encodes quantum-chemical symmetry and permutational invariance through architecture and training (e.g., via the PhysNet design and Gaussian distance descriptors) (Song et al., 2024).
  • Extrapolation to longer timescales and higher excitation energies is robust within the domain of sampled reference data; however, out-of-domain generalization (e.g., exotic photofragmentation) is not guaranteed without substantial new sampling.

7. Significance and Impact in Computational and Atmospheric Chemistry

The CASPT2/aug-cc-pVTZ ML-PES paradigm for species such as H₂COO establishes an accurate, functionally ab initio-quality tool for simulating reaction dynamics and energy partitioning in atmospherically relevant intermediates. Quantitative agreement with both experiment and high-level theory is achieved for branching ratios, barrier heights, and product distributions, allowing the full mechanistic characterization of photoactivated decomposition including nonstatistical effects (Yin et al., 25 Jul 2025, Yin et al., 17 Jan 2026). These PESs enable investigations not accessible by direct quantum chemistry due to system size and trajectory demands and represent a transferable framework for future atmospheric and physical chemistry studies requiring subchemical-accuracy global reactive surfaces.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Machine Learned Potential Energy Surface (CASPT2/aug-cc-pVTZ).