Papers
Topics
Authors
Recent
2000 character limit reached

Better Hessians Matter: Advances in Curvature

Updated 20 December 2025
  • Better Hessians matter are high-fidelity second derivative approximations that improve optimization efficiency, model accuracy, and simulation reliability across various scientific and ML applications.
  • Recent advances employ scalable techniques like hierarchical, matrix-free, and sketching methods to reduce computational costs while preserving model precision.
  • Integrating better Hessians in computational workflows leads to significant gains, such as up to 200-fold increases in successful transition-state searches and improved interpretability.

Better Hessians Matter

Accurately computed, efficiently approximated, and judiciously utilized Hessians—second derivatives of objective functions—are central to a wide range of computational mathematics, machine learning, quantum chemistry, scientific computing, and inverse problems. Recent advances demonstrate that improved Hessians yield substantial gains in accuracy, speed, reliability, and scalability across model training, optimization, interpretability, and physical simulations. This article surveys breakthroughs enabled by "better Hessians," focusing on data-rich regimes, scalable approximation strategies, algorithmic innovations, and the growing demands of scientific and machine learning applications.

1. Fundamentals: The Role and Construction of Hessians

The Hessian matrix H=2f(x)\mathbf{H} = \nabla^2 f(x) provides a complete local curvature description of a scalar-valued function ff at point xx. In applications such as quantum chemistry, H\mathbf{H} encodes the curvature of the potential energy surface (PES) with respect to atomic coordinates RiR_i, critical for transition-state (TS) searches and vibrational analyses (Cui et al., 18 May 2025, Williams et al., 15 Aug 2024). In deep learning, the empirical-risk Hessian Hθ=θ2R(θ)H_\theta = \nabla^2_\theta R(\theta) reflects the curvature of the loss landscape, underpinning second-order optimization, sensitivity analysis, and influence function calculations (Hong et al., 27 Sep 2025, Granziol, 16 May 2025). In PDE-constrained inverse problems, the Gauss–Newton or full Hessian governs convergence rates and uncertainty quantification in large-scale parameter estimation (Hartland et al., 2023, Ambartsumyan et al., 2020).

Direct computation of full Hessians for large-scale problems (n>104n > 10^4) is prohibitive. Advances include numerical finite-difference methods, analytic differentiation, automatic differentiation with sparse or structured exploitation (Bell et al., 2021, Hill et al., 29 Jan 2025), and dimension reduction via sketching (Li et al., 2021), hierarchical (Hartland et al., 2023), or blockwise factorization (Hong et al., 27 Sep 2025).

2. Data-Driven Machine Learning: Accuracy in Potential Surfaces and Force Fields

Large-Scale Hessian Databases

The HORM dataset (Cui et al., 18 May 2025) provides 1.84 million quantum-chemically computed Hessians at the ω\omegaB97x/6-31G(d) level for off-equilibrium molecular geometries along diverse reaction paths, surpassing previous datasets (e.g., Hessian-QM9's 41,645 equilibria). Such diversity enables supervised training of ML interatomic potentials (MLIPs) with direct Hessian supervision.

Similarly, Hessian QM9 (Williams et al., 15 Aug 2024) delivers Hessians for 41,645 small organic molecules in vacuum and implicit solvents, supporting solvent-aware MLIP development.

Hessian-Informed MLIP Training

Incorporating Hessian losses into MLIP training yields drastic reductions (59–97%) in Hessian mean absolute error (MAE) compared to models trained only on energies and forces (see Table below, (Cui et al., 18 May 2025)).

Model Hessian MAE (E-F) Hessian MAE (E-F-H) Reduction (%)
AlphaNet 0.433 0.303 30
LEFTNet 0.366 0.151 59
LEFTNet-df 1.648 0.197 88
EquiformerV2 2.231 0.075 97

Training with second-derivative supervision enables up to 200-fold increases in successful transition-state searches. Improvements in vibrational spectra prediction by 75–80% MAE are observed in all solvent environments (Williams et al., 15 Aug 2024). Efficient Hessian-informed approaches use stochastic row sampling (vector–Jacobian products) to reduce computational scaling from O(N2)\mathcal{O}(N^2) to O(s)\mathcal{O}(s) per structure.

Numerical Hessians for Surface Chemistry

Numerically computed Hessians via graph neural network (GNN) potentials enable vibrational free energy and entropy calculations, critical for catalysis. After systematic offset correction, ML-predicted Hessians reach 58 cm1^{-1} MAE (vibrational frequencies) and 0.042 eV MAE for Gibbs-energy-derived entropy (Wander et al., 2 Oct 2024). ML Hessians, implemented within transition-state search schemes, raise convergence rates from 80% to over 93% and halve the number of failed optimization cases, supporting the use of ML Hessians as drop-in surrogates for DFT Hessians in routine high-throughput pipelines.

3. Optimization and Machine Learning: Scalable, Structured, and Interpretable Curvature

Curvature Approximations and Influence Functions

Influence functions require inverse Hessian–vector products (IHVPs), intractable for deep networks without approximation. Structured Hessian approximations such as Generalized Gauss–Newton (GGN), Kronecker-factored (K-FAC), and block-diagonal variants are widely adopted (Hong et al., 27 Sep 2025). Rigorous empirical studies demonstrate that tighter Hessian approximations yield better attribution quality, with the error decomposition identifying the dominant losses as coming from Kronecker eigenvalue mismatch (EK-FAC→K-FAC, 50–65% of error gap) and block-diagonalization steps. Improvements in the Linear Data-modelling Score (LDS) are strongly correlated with better Hessian approximation, justifying efforts to develop and use higher-fidelity curvature models for influence-based data attribution.

Global Optimization Guarantees with Approximate Hessians

Gradient-Normalized Smoothness (GNS) theory unifies local Hessian approximation error and global algorithmic convergence guarantees (Semenov et al., 16 Jun 2025). Given a pointwise error bound 2f(x)H(x)Δ0+Δ1f(x)1β\|\nabla^2 f(x) - H(x)\| \leq \Delta_0 + \Delta_1 \|\nabla f(x)\|_*^{1-\beta}, if βα\beta \leq \alpha (problem's smoothness exponent), global iteration complexity matches that of exact-Newton methods — regardless of whether H(x)H(x) is Fisher, Gauss–Newton, or related. This framework encompasses convex, non-convex, and quasi-self-concordant settings, supporting efficient second-order methods in large-scale learning via low-cost curvature surrogates.

Diagonal and Sparse Approximations

Highly efficient diagonal approximations, such as HesScale (refinement of BL89), accurately estimate layerwise Hessian diagonals, improving convergence/stability in both supervised and reinforcement learning (Elsayed et al., 5 Jun 2024). Empirical results demonstrate that HesScale achieves higher accuracy than MC-based and structured alternatives, with minimal overhead. Operator-overloading based automatic sparse differentiation (ASD) now enables computing exact sparse Hessians at scales n104n \gtrsim 10^410510^5, delivering 1000×–6000× speed-ups over standard AD, and facilitating direct Newton solves, Laplace approximations, and implicit differentiation in scientific ML (Hill et al., 29 Jan 2025, Bell et al., 2021).

4. Large-Scale and Scientific Computing: Hierarchical, Low-Rank, and Matrix-Free Strategies

Foundation-Scale and Distributed Hessian Computation

For models of up to 100 billion parameters, the HessFormer package enables distributed computation of Hessian-vector products and the spectral density of the Hessian via stochastic Lanczos quadrature (Granziol, 16 May 2025). This supports robust global learning rate/step-size selection, evaluation of compression/regularization strategies, and sensitivity diagnostics in foundation models, closing the gap between theory for small models and practice for state-of-the-art LLMs.

Hierarchical and Matrix-Free Approximations

In PDE-constrained inverse problems, hierarchical off-diagonal low-rank (HODLR) and hierarchical (H\mathcal{H}-matrix) representations of Hessians reduce complexity from O(N3)O(N^3) or O(rN2)O(r N^2) to log-linear O(Nlog2N)O(N \log^2 N) (Hartland et al., 2023, Ambartsumyan et al., 2020). Empirical studies show that HODLR compression outperforms global low-rank approximations once information content increases, enabling fast Newton-solves and posterior sampling at full field scale (e.g., Greenland ice sheet, N>105N > 10^5). Matrix-free point spread function (PSF) techniques approximate high-rank Hessians with localized kernel interpolation, reducing the required number of expensive PDE solves by 5–10× versus regularization- or low-rank preconditioning, and maintaining tight spectral clustering in the preconditioned Hessian (Alger et al., 2023).

Sketching and Learning-Augmented Approximations

Learning-augmented Hessian sketching (Li et al., 2021) employs oracles for leverage score detection and trainable sketch values to minimize distortion in the compressed subspace—yielding reductions in per-iteration cost and improved second-order accuracy in sketched Newton-type methods. Empirical error reductions of 40–80% in convergence rates compared to standard Count-Sketch are reported in LASSO and nuclear-norm regression.

5. Geometry, Certification, and Model Analysis

Convexity Certification

The Hessian approach to convexity certification surpasses the DCP (Disciplined Convex Programming) syntactic approach. By propagating positive semidefiniteness (PSD) through computational graphs of second derivatives, and analytically recognizing variance-type templates, this method certifies convexity for a strictly richer class of differentiable functions (Klaus et al., 2022). Complexity is linear in the DAG size, and the method subsumes all standard DCP rules while strictly extending the certifiable function set.

Hessian-Based Recovery and Structure

In finite element methods, polynomial-preserving recovery operators based on double gradient recovery (PPR-PPR) yield Hessian reconstructions with super- and ultra-convergence properties, attaining O(hk)O(h^k) accuracy on mildly structured and O(hk+1)O(h^{k+1}) in translation-invariant meshes (Guo et al., 2014). Matrix-algebraic approaches produce order-$1$ or order-$2$ accurate approximate Hessians using only function evaluations, enabling derivative-free optimization with O(n2)O(n^2) cost (Hare et al., 2023).

Applications in Physical Models and Geometry

In spinfoam models of quantum gravity, non-degenerate Hessians at Regge-like critical points guarantee the validity of the stationary phase expansion, ensuring correct semiclassical limits and the absence of spurious contributions (Kamiński et al., 14 Oct 2025). In computational geometry and visual SLAM, local Hessians from relative-motion problems are harnessed to weight global bundle adjustment objectives, yielding pose-graph solutions that closely match the accuracy of full point-based bundle adjustment at a fraction of the cost (Rupnik et al., 2023).

6. Challenges, Limitations, and Future Directions

Despite the clear benefits of better Hessians, several challenges persist:

  • Data scarcity and representativity: Datasets such as HORM and Hessian QM9 have begun to address the lack of diverse high-quality Hessian data in chemistry, but further coverage—especially for larger and more complex systems—is needed (Cui et al., 18 May 2025, Williams et al., 15 Aug 2024).
  • Scalability: While HODLR, H\mathcal{H}-matrix, and PSF methods scale logarithmically, their efficiency depends on the off-diagonal compressibility of the Hessian, which may be problem-dependent (Hartland et al., 2023, Alger et al., 2023).
  • Hessian Approximation Error: For inverse or influence computation tasks, Kronecker and block-diagonal factorizations can dominate the approximation error, highlighting the need for higher-fidelity models and hybrid correction schemes (Hong et al., 27 Sep 2025).
  • Numerical Stability and Regularization: Many approximation schemes must address ill-conditioning or require systematic bias correction (e.g., via offset corrections in ML-computed Hessians (Wander et al., 2 Oct 2024)) and careful monitoring of low-rank/diagonal scales.
  • Integration and Exploitation of Structure: Further advances may exploit additional tensor/kernel structure, exploit sparsity more aggressively, or integrate learning-based sketching into higher-order or problem-adaptive methods (Hill et al., 29 Jan 2025, Li et al., 2021).

The ongoing expansion of high-quality Hessian data, distributed and scalable algorithmic primitives, improved local-global approximations, and structure- or data-adaptive methods is rapidly extending the practical frontiers of what can be achieved with “better Hessians” in modern computational science and machine learning.

7. References

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Better Hessians Matter.