Better Hessians Matter: Advances in Curvature

Updated 20 December 2025

Better Hessians matter are high-fidelity second derivative approximations that improve optimization efficiency, model accuracy, and simulation reliability across various scientific and ML applications.
Recent advances employ scalable techniques like hierarchical, matrix-free, and sketching methods to reduce computational costs while preserving model precision.
Integrating better Hessians in computational workflows leads to significant gains, such as up to 200-fold increases in successful transition-state searches and improved interpretability.

Better Hessians Matter

Accurately computed, efficiently approximated, and judiciously utilized Hessians—second derivatives of objective functions—are central to a wide range of computational mathematics, machine learning, quantum chemistry, scientific computing, and inverse problems. Recent advances demonstrate that improved Hessians yield substantial gains in accuracy, speed, reliability, and scalability across model training, optimization, interpretability, and physical simulations. This article surveys breakthroughs enabled by "better Hessians," focusing on data-rich regimes, scalable approximation strategies, algorithmic innovations, and the growing demands of scientific and machine learning applications.

1. Fundamentals: The Role and Construction of Hessians

The Hessian matrix $\mathbf{H} = \nabla^2 f(x)$ provides a complete local curvature description of a scalar-valued function $f$ at point $x$ . In applications such as quantum chemistry, $\mathbf{H}$ encodes the curvature of the potential energy surface (PES) with respect to atomic coordinates $R_i$ , critical for transition-state (TS) searches and vibrational analyses (Cui et al., 18 May 2025, Williams et al., 15 Aug 2024). In deep learning, the empirical-risk Hessian $H_\theta = \nabla^2_\theta R(\theta)$ reflects the curvature of the loss landscape, underpinning second-order optimization, sensitivity analysis, and influence function calculations (Hong et al., 27 Sep 2025, Granziol, 16 May 2025). In PDE-constrained inverse problems, the Gauss–Newton or full Hessian governs convergence rates and uncertainty quantification in large-scale parameter estimation (Hartland et al., 2023, Ambartsumyan et al., 2020).

Direct computation of full Hessians for large-scale problems ( $n > 10^4$ ) is prohibitive. Advances include numerical finite-difference methods, analytic differentiation, automatic differentiation with sparse or structured exploitation (Bell et al., 2021, Hill et al., 29 Jan 2025), and dimension reduction via sketching (Li et al., 2021), hierarchical (Hartland et al., 2023), or blockwise factorization (Hong et al., 27 Sep 2025).

2. Data-Driven Machine Learning: Accuracy in Potential Surfaces and Force Fields

Large-Scale Hessian Databases

The HORM dataset (Cui et al., 18 May 2025) provides 1.84 million quantum-chemically computed Hessians at the $\omega$ B97x/6-31G(d) level for off-equilibrium molecular geometries along diverse reaction paths, surpassing previous datasets (e.g., Hessian-QM9's 41,645 equilibria). Such diversity enables supervised training of ML interatomic potentials (MLIPs) with direct Hessian supervision.

Similarly, Hessian QM9 (Williams et al., 15 Aug 2024) delivers Hessians for 41,645 small organic molecules in vacuum and implicit solvents, supporting solvent-aware MLIP development.

Hessian-Informed MLIP Training

Incorporating Hessian losses into MLIP training yields drastic reductions (59–97%) in Hessian mean absolute error (MAE) compared to models trained only on energies and forces (see Table below, (Cui et al., 18 May 2025)).

Model	Hessian MAE (E-F)	Hessian MAE (E-F-H)	Reduction (%)
AlphaNet	0.433	0.303	30
LEFTNet	0.366	0.151	59
LEFTNet-df	1.648	0.197	88
EquiformerV2	2.231	0.075	97

Training with second-derivative supervision enables up to 200-fold increases in successful transition-state searches. Improvements in vibrational spectra prediction by 75–80% MAE are observed in all solvent environments (Williams et al., 15 Aug 2024). Efficient Hessian-informed approaches use stochastic row sampling (vector–Jacobian products) to reduce computational scaling from $\mathcal{O}(N^2)$ to $\mathcal{O}(s)$ per structure.

Numerical Hessians for Surface Chemistry

Numerically computed Hessians via graph neural network (GNN) potentials enable vibrational free energy and entropy calculations, critical for catalysis. After systematic offset correction, ML-predicted Hessians reach 58 cm $^{-1}$ MAE (vibrational frequencies) and 0.042 eV MAE for Gibbs-energy-derived entropy (Wander et al., 2 Oct 2024). ML Hessians, implemented within transition-state search schemes, raise convergence rates from 80% to over 93% and halve the number of failed optimization cases, supporting the use of ML Hessians as drop-in surrogates for DFT Hessians in routine high-throughput pipelines.

3. Optimization and Machine Learning: Scalable, Structured, and Interpretable Curvature

Curvature Approximations and Influence Functions

Influence functions require inverse Hessian–vector products (IHVPs), intractable for deep networks without approximation. Structured Hessian approximations such as Generalized Gauss–Newton (GGN), Kronecker-factored (K-FAC), and block-diagonal variants are widely adopted (Hong et al., 27 Sep 2025). Rigorous empirical studies demonstrate that tighter Hessian approximations yield better attribution quality, with the error decomposition identifying the dominant losses as coming from Kronecker eigenvalue mismatch (EK-FAC→K-FAC, 50–65% of error gap) and block-diagonalization steps. Improvements in the Linear Data-modelling Score (LDS) are strongly correlated with better Hessian approximation, justifying efforts to develop and use higher-fidelity curvature models for influence-based data attribution.

Global Optimization Guarantees with Approximate Hessians

Gradient-Normalized Smoothness (GNS) theory unifies local Hessian approximation error and global algorithmic convergence guarantees (Semenov et al., 16 Jun 2025). Given a pointwise error bound $\|\nabla^2 f(x) - H(x)\| \leq \Delta_0 + \Delta_1 \|\nabla f(x)\|_*^{1-\beta}$ , if $\beta \leq \alpha$ (problem's smoothness exponent), global iteration complexity matches that of exact-Newton methods — regardless of whether $H(x)$ is Fisher, Gauss–Newton, or related. This framework encompasses convex, non-convex, and quasi-self-concordant settings, supporting efficient second-order methods in large-scale learning via low-cost curvature surrogates.

Diagonal and Sparse Approximations

Highly efficient diagonal approximations, such as HesScale (refinement of BL89), accurately estimate layerwise Hessian diagonals, improving convergence/stability in both supervised and reinforcement learning (Elsayed et al., 5 Jun 2024). Empirical results demonstrate that HesScale achieves higher accuracy than MC-based and structured alternatives, with minimal overhead. Operator-overloading based automatic sparse differentiation (ASD) now enables computing exact sparse Hessians at scales $n \gtrsim 10^4$ – $10^5$ , delivering 1000×–6000× speed-ups over standard AD, and facilitating direct Newton solves, Laplace approximations, and implicit differentiation in scientific ML (Hill et al., 29 Jan 2025, Bell et al., 2021).

4. Large-Scale and Scientific Computing: Hierarchical, Low-Rank, and Matrix-Free Strategies

Foundation-Scale and Distributed Hessian Computation

For models of up to 100 billion parameters, the HessFormer package enables distributed computation of Hessian-vector products and the spectral density of the Hessian via stochastic Lanczos quadrature (Granziol, 16 May 2025). This supports robust global learning rate/step-size selection, evaluation of compression/regularization strategies, and sensitivity diagnostics in foundation models, closing the gap between theory for small models and practice for state-of-the-art LLMs.

Hierarchical and Matrix-Free Approximations

In PDE-constrained inverse problems, hierarchical off-diagonal low-rank (HODLR) and hierarchical ( $\mathcal{H}$ -matrix) representations of Hessians reduce complexity from $O(N^3)$ or $O(r N^2)$ to log-linear $O(N \log^2 N)$ (Hartland et al., 2023, Ambartsumyan et al., 2020). Empirical studies show that HODLR compression outperforms global low-rank approximations once information content increases, enabling fast Newton-solves and posterior sampling at full field scale (e.g., Greenland ice sheet, $N > 10^5$ ). Matrix-free point spread function (PSF) techniques approximate high-rank Hessians with localized kernel interpolation, reducing the required number of expensive PDE solves by 5–10× versus regularization- or low-rank preconditioning, and maintaining tight spectral clustering in the preconditioned Hessian (Alger et al., 2023).

Sketching and Learning-Augmented Approximations

Learning-augmented Hessian sketching (Li et al., 2021) employs oracles for leverage score detection and trainable sketch values to minimize distortion in the compressed subspace—yielding reductions in per-iteration cost and improved second-order accuracy in sketched Newton-type methods. Empirical error reductions of 40–80% in convergence rates compared to standard Count-Sketch are reported in LASSO and nuclear-norm regression.

5. Geometry, Certification, and Model Analysis

Convexity Certification

The Hessian approach to convexity certification surpasses the DCP (Disciplined Convex Programming) syntactic approach. By propagating positive semidefiniteness (PSD) through computational graphs of second derivatives, and analytically recognizing variance-type templates, this method certifies convexity for a strictly richer class of differentiable functions (Klaus et al., 2022). Complexity is linear in the DAG size, and the method subsumes all standard DCP rules while strictly extending the certifiable function set.

Hessian-Based Recovery and Structure

In finite element methods, polynomial-preserving recovery operators based on double gradient recovery (PPR-PPR) yield Hessian reconstructions with super- and ultra-convergence properties, attaining $O(h^k)$ accuracy on mildly structured and $O(h^{k+1})$ in translation-invariant meshes (Guo et al., 2014). Matrix-algebraic approaches produce order-$1$ or order-$2$ accurate approximate Hessians using only function evaluations, enabling derivative-free optimization with $O(n^2)$ cost (Hare et al., 2023).

Applications in Physical Models and Geometry

In spinfoam models of quantum gravity, non-degenerate Hessians at Regge-like critical points guarantee the validity of the stationary phase expansion, ensuring correct semiclassical limits and the absence of spurious contributions (Kamiński et al., 14 Oct 2025). In computational geometry and visual SLAM, local Hessians from relative-motion problems are harnessed to weight global bundle adjustment objectives, yielding pose-graph solutions that closely match the accuracy of full point-based bundle adjustment at a fraction of the cost (Rupnik et al., 2023).

6. Challenges, Limitations, and Future Directions

Despite the clear benefits of better Hessians, several challenges persist:

Data scarcity and representativity: Datasets such as HORM and Hessian QM9 have begun to address the lack of diverse high-quality Hessian data in chemistry, but further coverage—especially for larger and more complex systems—is needed (Cui et al., 18 May 2025, Williams et al., 15 Aug 2024).
Scalability: While HODLR, $\mathcal{H}$ -matrix, and PSF methods scale logarithmically, their efficiency depends on the off-diagonal compressibility of the Hessian, which may be problem-dependent (Hartland et al., 2023, Alger et al., 2023).
Hessian Approximation Error: For inverse or influence computation tasks, Kronecker and block-diagonal factorizations can dominate the approximation error, highlighting the need for higher-fidelity models and hybrid correction schemes (Hong et al., 27 Sep 2025).
Numerical Stability and Regularization: Many approximation schemes must address ill-conditioning or require systematic bias correction (e.g., via offset corrections in ML-computed Hessians (Wander et al., 2 Oct 2024)) and careful monitoring of low-rank/diagonal scales.
Integration and Exploitation of Structure: Further advances may exploit additional tensor/kernel structure, exploit sparsity more aggressively, or integrate learning-based sketching into higher-order or problem-adaptive methods (Hill et al., 29 Jan 2025, Li et al., 2021).

The ongoing expansion of high-quality Hessian data, distributed and scalable algorithmic primitives, improved local-global approximations, and structure- or data-adaptive methods is rapidly extending the practical frontiers of what can be achieved with “better Hessians” in modern computational science and machine learning.

7. References

(Cui et al., 18 May 2025) HORM: A Large Scale Molecular Hessian Database for Optimizing Reactive Machine Learning Interatomic Potentials.
(Hong et al., 27 Sep 2025) Better Hessians Matter: Studying the Impact of Curvature Approximations in Influence Functions.
(Kamiński et al., 14 Oct 2025) Hessian in the spinfoam models with cosmological constant.
(Semenov et al., 16 Jun 2025) Gradient-Normalized Smoothness for Optimization with Approximate Hessians.
(Williams et al., 15 Aug 2024) Hessian QM9: A quantum chemistry database of molecular Hessians in implicit solvents.
(Wander et al., 2 Oct 2024) Accessing Numerical Energy Hessians with Graph Neural Network Potentials and Their Application in Heterogeneous Catalysis.
(Granziol, 16 May 2025) HessFormer: Hessians at Foundation Scale.
(Elsayed et al., 5 Jun 2024) Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning.
(Hill et al., 29 Jan 2025) Sparser, Better, Faster, Stronger: Sparsity Detection for Efficient Automatic Differentiation.
(Bell et al., 2021) Computing Sparse Jacobians and Hessians Using Algorithmic Differentiation.
(Klaus et al., 2022) Convexity Certificates from Hessians.
(Li et al., 2021) Learning-Augmented Sketches for Hessians.
(Hartland et al., 2023) Hierarchical off-diagonal low-rank approximation of Hessians in inverse problems, with application to ice sheet model initializaiton.
(Ambartsumyan et al., 2020) Hierarchical Matrix Approximations of Hessians Arising in Inverse Problems Governed by PDEs.
(Alger et al., 2023) Point spread function approximation of high rank Hessians with locally supported non-negative integral kernels.
(Mathieu et al., 2014) Fast Approximation of Rotations and Hessians matrices.
(Guo et al., 2014) Hessian Recovery for Finite Element Methods.
(Hare et al., 2023) A matrix algebra approach to approximate Hessians.
(Rupnik et al., 2023) Pointless Global Bundle Adjustment With Relative Motions Hessians.