Materials-HAM-SOC: SOC Hamiltonian Benchmark

Updated 26 September 2025

Materials-HAM-SOC is a benchmark dataset of 17,000 crystal structures with explicit spin–orbit coupling corrections for accurate ML Hamiltonian prediction.
NextHAM employs an E(3)-equivariant neural transformer with residual ΔH learning and dual-space loss to refine zeroth-step Hamiltonians efficiently.
The integration of these tools achieves high-fidelity electronic band structure predictions while reducing computational time by over 97% compared to conventional DFT.

Materials-HAM-SOC refers to a high-quality, broad-coverage benchmark dataset for machine-learning-based Hamiltonian prediction in materials science, particularly emphasizing explicit spin–orbit coupling (SOC) effects, as well as to methodological developments exemplified by the NextHAM universal deep learning framework. This integrated data–methodology resource is designed to accelerate and systematize the prediction of electronic-structure Hamiltonians and band structures for complex materials systems, enabling large-scale, high-throughput, and physically accurate electronic structure modeling.

1. Dataset Definition and Scope

Materials-HAM-SOC is a curated dataset consisting of 17,000 crystal structures, with each entry pairing an atomic-level structural model with an associated self-consistent Hamiltonian including SOC corrections. The dataset encompasses 68 chemical elements spanning the first six rows of the periodic table, with each material represented by a high-quality atomic orbital basis set (up to 4s2p2d1f orbitals per element). The dataset is partitioned into 12,000 training, 2,000 validation, and 3,000 test structures to support benchmarking and algorithm development.

The explicit inclusion of SOC (i.e., nontrivial spin-off-diagonal matrix elements) is a key distinguishing feature. The underlying electronic-structure Hamiltonians are obtained from density functional theory (DFT) calculations that accurately treat both scalar relativistic and SOC effects, enabling fine-grained learning and assessment of ML models aimed at universal transferability across chemical, structural, and relativistic diversity.

2. The NextHAM Framework: Physics-Informed Deep Learning Hamiltonian Prediction

NextHAM is a neural network architecture purpose-built for universal Hamiltonian regression in the presence of SOC. The framework is characterized by the following core components:

Zeroth-step Hamiltonian, $H^{(0)}$ : Constructed rapidly from the initial charge density $\rho^{(0)}(r)$ (sum of neutral atomic densities), including electron–ion and approximate electron–electron interactions. $H^{(0)}$ serves both as a physics-informed input descriptor and as a prior in the output layer.
Residual Learning of Corrections, $\Delta H$ : Instead of directly regressing the target (self-consistent) Hamiltonian $H^{(T)}$ , NextHAM learns the correction $\Delta H = H^{(T)} - H^{(0)}$ , which dramatically simplifies the machine learning mapping and reduces the regression target's dynamic numerical range.
E(3)-Equivariant Neural Transformer Backbone: The architecture maintains full Euclidean symmetry, updating not only node (atomic) features but also edge (orbital pair) features, with interatomic geometry and orbital type awareness. It employs specially designed geometric attention and distance embedding modules.
TraceGrad Module: This module introduces SO(3)-invariant trace quantities, e.g., $T = \mathrm{tr}(\Delta H \cdot \Delta H^\dagger)$ , as non-linear observables from equivariant features, with gradients from these invariants used to enhance learning stability and expressivity without breaking symmetry.

3. Dual-Space Training Objective and Error Mitigation

To ensure the accuracy of both direct Hamiltonian prediction and the resultant physical observables (e.g., band structure), NextHAM's loss function supervises prediction in both real (R) and reciprocal (k) space:

R-space loss ( $\mathrm{loss}(R)$ ) combines mean-square error for the Hamiltonian matrix elements with mean-absolute error for the SO(3)-invariant trace, scaled by a balancing function $\gamma$ .
k-space loss ( $\mathrm{loss}(k)$ ) projects the predicted Hamiltonian into low-energy (P) and high-energy (Q) bands and explicitly penalizes off-diagonal PQ-coupling—essential to preventing "ghost states" (unphysical spurious bands) that can otherwise arise due to ill-conditioning of the overlap matrix.
Gauge-Invariant Error Metric: The “Gauge_MAE” removes unphysical ambiguities associated with adding a chemical potential shift proportional to the overlap matrix, ensuring meaningful comparison across models.

This dual supervision links the ML model’s internal error directly to downstream quantities of interest, such as eigenvalues at each k-point, thus tightly coupling representational accuracy to physically relevant electronic spectra.

4. Experimental Results and Benchmarking

On the test partition of Materials-HAM-SOC:

Hamiltonian MAE: NextHAM achieves a gauge-invariant mean absolute error of approximately 1.417 meV for the total Hamiltonian prediction, with sub-μeV errors on particularly challenging spin-off-diagonal and imaginary components.
Band Structure Fidelity: The calculated band structures from NextHAM-predicted Hamiltonians agree closely with fully converged DFT results, while predictions based solely on the zeroth-step Hamiltonian (without ΔH correction) only qualitatively capture the gross electronic landscape.
Ghost State Suppression: Ablation studies show that removing either k-space loss or PQ-block penalties causes pronounced errors in predicted spectra at problematic k-points, confirming the necessity of full dual-space supervision.
Computational Efficiency: NextHAM offers an average speedup of over 97% relative to conventional DFT, reducing wall-time from ~2300 s (CPU) to ~60 s per structure (including both $H^{(0)}$ construction and neural inference), with additional acceleration possible via GPU deployment.

These outcomes demonstrate not only the state-of-the-art accuracy of universal Hamiltonian ML but also its practical viability for high-throughput quantum materials discovery.

5. Methodological Innovations: Symmetry and Expressive Correction

NextHAM addresses the central challenge of generalizing across complex materials classes by embedding physical knowledge at multiple levels:

Symmetry Incorporation: By respecting E(3) symmetry and leveraging SO(3)-invariant trace feedback, NextHAM ensures that predictions transform correctly under translations and rotations of the atomic structure, critical for the physical validity of off-symmetry and multi-component phases.
Expressivity/Non-linearity: The Transformer backbone, augmented by TraceGrad, provides the capacity to learn subtle higher-order effects required for complex chemical environments, while the residual learning approach circumvents the “regression cliff” that affects direct ab initio-to-target mappings.
SOC Handling: The inclusion of SOC in both the dataset (Materials-HAM-SOC) and the model allows accurate treatment of Hamiltonian spin structure and associated phenomena (e.g., band splitting, topological effects), with correct learning of off-diagonal matrix elements.

6. Implications, Applications, and Future Directions

The Materials-HAM-SOC and NextHAM ecosystem provides the essential foundation for universal, physically grounded, and computationally efficient machine learning Hamiltonian modeling:

Applications: These tools enable practical large-scale electronic structure property screening, descriptor-rich materials informatics, and rapid quantum design of functionals, with SOC explicitly accessible for studies of topological materials, valleytronics, spintronics, and other SOC-driven phenomena.
Limitations and Prospects: While NextHAM achieves DFT-level accuracy for a highly diverse dataset, further improvements in accuracy, especially for edge cases and very high-Z elements, could be realized by enlarging the dataset, parallelizing the $H^{(0)}$ construction, or further refining loss functionals for specific target applications.
Extensibility: The underlying approach—physics-informed priors, symmetry-reinforcing architectures, and dual-space supervision—may be generalized to include new quantum effects beyond SOC, such as electron correlation corrections or time-dependent fields.

Table: Key Features of Materials-HAM-SOC and NextHAM

Feature	Materials-HAM-SOC Dataset	NextHAM Framework
Scale	17,000 structures, 68 elements	Transformer, TraceGrad, dual-space loss
SOC Inclusion	Yes (explicit, all entries)	Yes (Hamiltonian-level, off-diagonal terms)
Hamiltonian Basis	Up to 4s2p2d1f/orbital/element	Residual correction ΔH over high-quality H⁽⁰⁾
Evaluation Metric	Gauge-invariant MAE, band structure fidelity	Dual loss (real, reciprocal space)
Computational Speed	>97% faster than DFT	∼60 s per structure

The convergence of large-scale, SOC-inclusive datasets and symmetry-aware neural architectures represents a foundational advance for materials informatics and theoretical condensed matter physics, portending a new era of routinely accessible, high-fidelity Hamiltonian predictions for materials exploration and discovery (Yin et al., 24 Sep 2025).

PDF Markdown Chat (Pro)

References (1)

Advancing Universal Deep Learning for Electronic-Structure Hamiltonian Prediction of Materials (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Materials-HAM-SOC.