Papers
Topics
Authors
Recent
Search
2000 character limit reached

Curvature-Informed Step Optimization

Updated 28 April 2026
  • Curvature-informed step is a method that integrates local curvature information (via Hessians, Jacobians, or proxies) to dynamically adapt optimization updates.
  • The approach employs preconditioners and adaptive step sizes, yielding faster convergence rates and enhanced robustness to noise and sharp minima.
  • Applications span deep learning, manifold optimization, and scientific computing, often outperforming standard gradient descent in efficiency and stability.

A curvature-informed step is an optimization update rule or algorithmic modification in which explicit local or global curvature information—in the form of Hessians, Jacobians, or curvature proxies—directly modulates the update, step-size, or component selection. Curvature-aware steps are designed to exploit geometric properties of the objective landscape, often yielding improved convergence rates, greater training stability, and enhanced robustness to sharp minima, and are increasingly used across deep learning, manifold optimization, model merging, and scientific computing.

1. Mathematical Formulation of Curvature-Informed Steps

Curvature-informed steps generalize standard first-order updates by preconditioning or rescaling the descent direction according to a curvature proxy. The canonical form is

xt+1=xtηPtgt,Pt0x_{t+1} = x_t - \eta\,P_t\,g_t, \qquad P_t \succ 0

where gt=f(xt)g_t = \nabla f(x_t) is the (possibly stochastic) gradient, PtP_t is a positive-definite preconditioner encoding curvature, and η\eta is a learning rate (Pooladzandi et al., 2024). When Pt=IP_t=I, this reduces to classical gradient descent; Pt=Ht1P_t=H_t^{-1} recovers Newton's method.

Construction of PtP_t may use full Hessian information, low-rank or diagonal approximations, or cheap proxies (e.g., empirical Fisher, secant formulae, or second-moment accumulators). In manifold optimization, PtP_t and step-size constraints derive from geometric smoothness constants involving sectional curvature bounds (Pareth, 26 Feb 2026).

2. Curvature-Informed Preconditioners and Update Schemes

Curvature-aware strategies span a wide range, often ornamenting first-order methods:

  • Lie-group preconditioners: Choose Pt=QtTQtP_t=Q_t^T Q_t with QtQ_t in a connected Lie subgroup (e.g., diagonal, X-shape, low-rank) (Pooladzandi et al., 2024). Matrix-free or low-rank update rules fit the preconditioner online using Hessian-vector products or finite gradient differences. The update minimizes a convex noise-robust criterion without line search or explicit damping.
  • Secant or Rayleigh quotient-based modulation: Compute a curvature indicator as

gt=f(xt)g_t = \nabla f(x_t)0

where gt=f(xt)g_t = \nabla f(x_t)1 is the latest step. This serves as a local quadratic model curvature, enabling adaptive gradient "boosting" via gt=f(xt)g_t = \nabla f(x_t)2, where gt=f(xt)g_t = \nabla f(x_t)3 is a curvature-gated gain (An et al., 16 Apr 2026). Similar techniques appear in scale-invariant Monte Carlo with discrete curvature radii (Madhavan et al., 2021), and PINN optimization (Fonseca et al., 2023).

  • Kronecker-factored approximations (KFAC): For structured objectives like PINNs, preconditioners are constructed by blockwise Kronecker-product approximation of the Gauss–Newton or natural-gradient metric, incorporating higher derivatives (e.g., Taylor-mode AD for the Laplacian) (Dangel et al., 2024).
  • Curvature-aware sparsification/selection: Model merging frameworks reweight or prune parameter vectors using elementwise second-moment statistics as diagonal curvature proxies, e.g., via the saliency score gt=f(xt)g_t = \nabla f(x_t)4, where gt=f(xt)g_t = \nabla f(x_t)5 is the optimizer's second-moment accumulator (Mahdavinia et al., 14 Sep 2025).
  • Manifold step-size control: On Riemannian manifolds, curvature is encoded in smoothness constants gt=f(xt)g_t = \nabla f(x_t)6, so explicit upper bounds for (stochastic) gradient and Newton-type steps are

gt=f(xt)g_t = \nabla f(x_t)7

with gt=f(xt)g_t = \nabla f(x_t)8 a "geometry package" bounding parallel transport and curvature distortion, and gt=f(xt)g_t = \nabla f(x_t)9 a transported Jacobian spectral bound (Pareth, 26 Feb 2026).

3. Theoretical Guarantees and Convergence Properties

Curvature-informed steps yield improved theoretical guarantees in various regimes:

  • Linear or near-quadratic convergence in convex/strongly convex regimes: With suitable spectral bounds on PtP_t0, curvature-informed PSD preconditioners PtP_t1 produce linear convergence; for PtP_t2–PtP_t3 strong convexity bounds, Newton-like rates are recoverable (Pooladzandi et al., 2024).
  • Explicit PtP_t4 or geometric rates: Local curvature descent schemes (e.g., LCD1/LCD2) admit explicit convergence rates, replacing global Lipschitz constants PtP_t5 with local curvature-derived PtP_t6, immediately tightening worst-case rates (Richtárik et al., 2024).
  • Noise robustness and step-size normalization: Online preconditioner fitting or step-size modulation naturally damps stochastic noise, removing the need for additional line-search, clipping, or hand-tuned damping (Pooladzandi et al., 2024, Richtárik et al., 2024, Madhavan et al., 2021).
  • Curvature-aware Polyak–Łojasiewicz inequalities: In Riemannian settings, explicit curvature-dependent bounds PtP_t7 ensure linear convergence provided the manifold geometry is controlled (Pareth, 26 Feb 2026).

4. Algorithmic Instantiations and Pseudocode

Curvature-informed steps are realized by several prominent algorithms:

Methodology Preconditioner / Modulation Key Ingredients
PSGD (Pooladzandi et al., 2024) PtP_t8, PtP_t9 in Lie group Curvature via Hessian-vector/finite diff
KFAC for PINNs (Dangel et al., 2024) η\eta0 Taylor-mode AD on network for PDE loss
OTA+FFG (Mahdavinia et al., 14 Sep 2025) Diagonal via η\eta1 Adam 2nd moment, Fisher/Hessian proxy
CA-AdamW (An et al., 16 Apr 2026) Rayleigh-quotient gain on secant correction Secant-based adaptive boost
LCD2 (Richtárik et al., 2024) Step-size η\eta2 Local curvature mapping η\eta3

All algorithms use cheap curvature proxies (directional derivatives, accumulated second moments, local models) or structured approximations (diagonal, Kronecker, Lie subgroups) to keep overhead manageable.

5. Practical Considerations and Empirical Findings

Several empirical conclusions are common across curvature-informed methods:

6. Extensions: Manifolds, Geometry, and Curvature-Regulated Dynamics

Curvature-informed steps generalize beyond flat parameter spaces:

  • Riemannian manifolds: Optimization on spaces with nontrivial geometric structure (e.g., SO(3), SE(3)) requires all step-size bounds and convergence analyses to explicitly account for sectional curvature, injectivity radius, and parallel transport distortion via a geometry package constant η\eta4. The resulting curvature-aware Sobolev constants define descent lemmas, step bounds, and local quadratic contraction for Newton-type methods (Pareth, 26 Feb 2026).
  • Graph and temporal diffusion: In dynamic network models, curvature (e.g., Ollivier-Ricci on graphs) guides information flow. Infection time prediction (R-ODE) selects the next informed node by maximal Ricci curvature, capturing the minimum "transportation effort" in learned embeddings (Sun et al., 2024).

7. Outlook and Significance

The curvature-informed step has emerged as a unifying paradigm bridging optimization theory, machine learning, manifold geometry, and large-scale model maintenance. By systematically incorporating second-order local information—either exactly, in approximated form, or via efficient surrogates—these procedures navigate complex loss surfaces with improved efficiency and robustness.

Notable patterns include the convergence of ideas from disparate communities: numerical optimization, geometric learning, post-hoc model merging, and scientific computing. A plausible implication is that future large-scale and scientific ML systems will increasingly rely on lightweight, curvature-aware primitives for both computational tractability and reliability in challenging, high-dimensional, and geometrically structured settings.

Relevant references include (Pooladzandi et al., 2024, Mahdavinia et al., 14 Sep 2025, Richtárik et al., 2024, An et al., 16 Apr 2026, Madhavan et al., 2021, Pareth, 26 Feb 2026, Bhardwaj et al., 2024, Dangel et al., 2024, Fonseca et al., 2023, Sun et al., 2024).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Curvature-Informed Step.