Quantum Natural-Gradient Optimizers

Updated 13 May 2026

Quantum natural-gradient optimizers are a class of gradient-based techniques that leverage the intrinsic Riemannian geometry of quantum state manifolds via quantum Fisher information.
They incorporate methods such as block-diagonal approximations, momentum-enhanced updates, and geodesic corrections to efficiently navigate complex quantum landscapes.
These optimizers mitigate issues like barren plateaus and resource inefficiencies, accelerating convergence in VQE, QAOA, and quantum neural network training.

Quantum natural-gradient optimizers are a class of gradient-based techniques for variational quantum algorithms (VQAs) that leverage the Riemannian geometry of quantum state manifolds. By incorporating the quantum Fisher information (QFI) or the real part of the quantum geometric tensor (GQGT), these optimizers define parameter updates aligned with the intrinsic geometry of the parameterized quantum state space. Quantum natural-gradient methods have been systematically developed to address the challenges of convergence, barren plateaus, and quantum resource efficiency across a wide array of quantum optimization settings, including Variational Quantum Eigensolver (VQE), Quantum Approximate Optimization Algorithm (QAOA), quantum neural networks, and quantum state preparation tasks.

1. Information-Geometric Foundations

The core principle of quantum natural-gradient optimization is steepest descent in the quantum information geometry defined by the Fubini–Study metric. For a parameterized quantum state $|\psi(\theta)\rangle = U(\theta)|0\rangle$ , with classical parameters $\theta\in\mathbb{R}^m$ or more generally complex parameters $\zeta\in\mathbb{C}^p$ , the quantum geometric tensor is defined as

$G_{ij}(\theta) = \langle\partial_i\psi|\partial_j\psi\rangle - \langle\partial_i\psi|\psi\rangle\langle\psi|\partial_j\psi\rangle,$

and the Fubini–Study (FS) metric is its real part: $F_{ij}(\theta) = \mathrm{Re} G_{ij}(\theta)$ . For mixed states, the appropriate metric is the symmetric logarithmic derivative (SLD) quantum Fisher information, with generalizations via Petz monotone and non-monotone functions explored for further acceleration (Sasaki et al., 2024, Miyahara, 21 Oct 2025).

The natural-gradient update at iteration $t$ is

$\theta_{t+1} = \theta_t - \eta F(\theta_t)^{-1}\nabla_\theta L(\theta_t),$

where $L(\theta)$ is the objective function (e.g., energy expectation), $\nabla_\theta L$ its gradient, and $\eta>0$ the learning rate (Stokes et al., 2019, Yamamoto, 2019). The update direction is invariant under smooth reparameterizations, as it arises from the geometry of the projective Hilbert space.

Key geometrical properties:

Riemannian structure: Updates follow geodesics in the quantum state manifold, directly exploiting distinguishability between neighboring states.
Imaginary-time evolution: The quantum natural-gradient flow is equivalent to projected imaginary-time evolution in the variational manifold (Koczor et al., 2019).
Optimization rationale: The update emerges as the minimization of $\theta\in\mathbb{R}^m$ 0 subject to a small step measured by the FS distance.

2. Formulations, Variants, and Extensions

2.1 Standard QNG and Block-Diagonal Approximations

For practical circuits with many parameters, computing the full $\theta\in\mathbb{R}^m$ 1 matrix is often prohibitive. A block-diagonal approach exploits parameter localization (gate commutativity and layered circuits) so that each block corresponds to a subset of parameters (such as layerwise or per-qubit blocks), lowering quantum and classical complexity from $\theta\in\mathbb{R}^m$ 2 to $\theta\in\mathbb{R}^m$ 3 per iteration (Stokes et al., 2019, Yao et al., 2021). Diagonal (variance-only) approximations are opportunistically useful but can neglect essential parameter correlations.

2.2 Momentum-Enhanced and Langevin-Inspired Updates

Incorporating momentum and stochasticity uses a continuous-time Langevin SDE with the FS metric preconditioning: $\theta\in\mathbb{R}^m$ 4 where $\theta\in\mathbb{R}^m$ 5 is a Wiener increment, and $\theta\in\mathbb{R}^m$ 6 controls noise injection. Discretization yields the Momentum-QNG scheme: $\theta\in\mathbb{R}^m$ 7 with $\theta\in\mathbb{R}^m$ 8 the momentum coefficient. This methodology improves traversal of plateaus and shallow minima, outperforming both basic QNG, Adam, and classical-momentum schemes in VQE and QAOA benchmarks (Borysenko et al., 2024, Lisart-Liebermann et al., 23 Apr 2025).

2.3 Geodesic Correction and Manifold Integrators

The quantum natural gradient can be enhanced by incorporating geodesic curvature corrections. The Quantum Natural Gradient with Geodesic Correction (QNGGC) amends the update via the Christoffel symbols $\theta\in\mathbb{R}^m$ 9 of the FS metric: $\zeta\in\mathbb{C}^p$ 0 This second-order integrator aligns the update more closely with the parameter manifold geodesic and achieves accelerated convergence, especially for shallow circuits (Halla, 2024).

2.4 Stochastic, Classical-Fisher, and Reduced-Resource QNG

Resource-constrained analogs include:

Random Natural Gradient (RNG): Approximates the QFIM by a classical Fisher information matrix constructed from random basis measurements, reducing quantum resource scaling from $\zeta\in\mathbb{C}^p$ 1 to $\zeta\in\mathbb{C}^p$ 2 per iteration while empirically matching QNG's accuracy (Kolotouros et al., 2023).
Stochastic-Coordinate QNG (SC-QNG): Restricts QNG updates at each step to a random active subset $\zeta\in\mathbb{C}^p$ 3 of parameters (low-rank $\zeta\in\mathbb{C}^p$ 4), with cost $\zeta\in\mathbb{C}^p$ 5 for $\zeta\in\mathbb{C}^p$ 6.
Hamiltonian-aware QNG (H-QNG): Builds the metric from derivatives of only those expectation values contributing to the observable, yielding a pullback FS metric of quantum cost $\zeta\in\mathbb{C}^p$ 7 (parameters $\zeta\in\mathbb{C}^p$ 8 number of Hamiltonian terms) per iteration, with convergence comparable to full QNG but lower total resource cost (Shi et al., 18 Nov 2025).
Weighted Approximate QNG (WA-QNG): For $\zeta\in\mathbb{C}^p$ 9-local Hamiltonians, forms a metric as a weighted sum over subsystem Hilbert–Schmidt blocks, inheriting favorable Gauss–Newton properties and improving convergence for nonuniform $G_{ij}(\theta) = \langle\partial_i\psi|\partial_j\psi\rangle - \langle\partial_i\psi|\psi\rangle\langle\psi|\partial_j\psi\rangle,$ 0-local structures (Shi et al., 7 Apr 2025).

2.5 Adaptivity and Warm Starts

Adaptive quantum natural-gradient methods include Armijo-type backtracking to robustly self-tune the learning rate at each step, increasing convergence reliability and removing the need for grid-based rate selection (Atif et al., 2022). The Look Around and Warm-Start (LAWS) strategy performs several local SGD-like steps before the QNG update, repositioning parameters closer to regions of large gradient to escape vanishing-gradient barren plateaus and improving both generalization and convergence in quantum classifiers (Tao et al., 2022).

2.6 Conjugate and Non-Monotonic Natural Gradients

Modified Conjugate Quantum Natural Gradient (CQNG) integrates natural-gradient updates with nonlinear conjugate-gradient methods. Search directions are iteratively built for approximate conjugacy under the FS metric, with per-step hyperparameter tuning for optimal efficiency, resulting in significant iteration-count reductions (Halla, 10 Jan 2025).

Relaxing the monotonicity constraint in quantum Fisher metric selection enables even faster convergence. Non-monotonic Petz function choices, such as those derived from sandwiched Rényi divergences, formally and numerically outperform SLD-based QNG in parameter learning and VQA tasks (Sasaki et al., 2024, Miyahara, 21 Oct 2025).

3. Implementation: Quantum Resource Scaling and Practicalities

The operational steps for QNG optimization are as follows:

Gradient evaluation: Parameter-shift rules on quantum hardware yield $G_{ij}(\theta) = \langle\partial_i\psi|\partial_j\psi\rangle - \langle\partial_i\psi|\psi\rangle\langle\psi|\partial_j\psi\rangle,$ 1; $G_{ij}(\theta) = \langle\partial_i\psi|\partial_j\psi\rangle - \langle\partial_i\psi|\psi\rangle\langle\psi|\partial_j\psi\rangle,$ 2 circuit evaluations per step (Yamamoto, 2019).
Metric evaluation: Full $G_{ij}(\theta) = \langle\partial_i\psi|\partial_j\psi\rangle - \langle\partial_i\psi|\psi\rangle\langle\psi|\partial_j\psi\rangle,$ 3 estimation requires $G_{ij}(\theta) = \langle\partial_i\psi|\partial_j\psi\rangle - \langle\partial_i\psi|\psi\rangle\langle\psi|\partial_j\psi\rangle,$ 4 quantum evals; block-diagonal, diagonal, or random-basis strategies can lower this to $G_{ij}(\theta) = \langle\partial_i\psi|\partial_j\psi\rangle - \langle\partial_i\psi|\psi\rangle\langle\psi|\partial_j\psi\rangle,$ 5 or $G_{ij}(\theta) = \langle\partial_i\psi|\partial_j\psi\rangle - \langle\partial_i\psi|\psi\rangle\langle\psi|\partial_j\psi\rangle,$ 6 per step (Stokes et al., 2019, Kolotouros et al., 2023).
Classical inversion: Matrix inversion of $G_{ij}(\theta) = \langle\partial_i\psi|\partial_j\psi\rangle - \langle\partial_i\psi|\psi\rangle\langle\psi|\partial_j\psi\rangle,$ 7 or block solves, with cost $G_{ij}(\theta) = \langle\partial_i\psi|\partial_j\psi\rangle - \langle\partial_i\psi|\psi\rangle\langle\psi|\partial_j\psi\rangle,$ 8 classically, becomes significant for large $G_{ij}(\theta) = \langle\partial_i\psi|\partial_j\psi\rangle - \langle\partial_i\psi|\psi\rangle\langle\psi|\partial_j\psi\rangle,$ 9; layered or blockwise inversion reduces this (Yao et al., 2021).
Regularization/tuning: Ill-conditioning is managed with small additive regularizers $F_{ij}(\theta) = \mathrm{Re} G_{ij}(\theta)$ 0; hyperparameter selection for $F_{ij}(\theta) = \mathrm{Re} G_{ij}(\theta)$ 1, $F_{ij}(\theta) = \mathrm{Re} G_{ij}(\theta)$ 2, or noise strength $F_{ij}(\theta) = \mathrm{Re} G_{ij}(\theta)$ 3 is critical and may follow observed landscape or Armijo rules.
Hardware consideration: Measurements for $F_{ij}(\theta) = \mathrm{Re} G_{ij}(\theta)$ 4 can be implemented via ancilla-based overlaps, swap tests, or appropriate hardware-dependent primitives (Wang et al., 2023).

On noisy, near-term devices, full $F_{ij}(\theta) = \mathrm{Re} G_{ij}(\theta)$ 5 estimation may be dominated by shot noise and decoherence; block-diagonal or surrogate-metric approaches (e.g. classical Fisher-based, subsystem-based) are more robust in these regimes (Dell'Anna et al., 27 Feb 2025, Koczor et al., 2019). For continuous-variable (optical) circuits, QNG can be extended with Wirtinger calculus to handle complex-valued parameterizations, supporting faster convergence and smoother optimization landscapes (Yao et al., 2021).

4. Empirical Performance and Comparative Studies

Quantum natural-gradient optimizers consistently demonstrate accelerated convergence, superior resilience to local minima, and reduced sensitivity to hyperparameter choices compared to plain gradient descent (GD), Adam, or BFGS, especially in VQE and QAOA settings (Wierichs et al., 2020, Lisart-Liebermann et al., 23 Apr 2025, Roy et al., 2023). Key performance attributes:

VQE (Investment Portfolio Problem): Momentum-QNG yields mean energy error $F_{ij}(\theta) = \mathrm{Re} G_{ij}(\theta)$ 6 vs. QNG ( $F_{ij}(\theta) = \mathrm{Re} G_{ij}(\theta)$ 7) and Adam ( $F_{ij}(\theta) = \mathrm{Re} G_{ij}(\theta)$ 8) at optimal $F_{ij}(\theta) = \mathrm{Re} G_{ij}(\theta)$ 9 (Borysenko et al., 2024).
QAOA (Minimum Vertex Cover, TFIM): At moderate $t$ 0, Momentum-QNG achieves quality ratio $t$ 1 vs. QNG $t$ 2; Adam sometimes converges slightly faster but less reliably for high $t$ 3.
Overparameterization robustness: NatGrad reliably deactivates redundant circuit layers, outperforming BFGS/Adam which become trapped in spurious minima (Wierichs et al., 2020).
Noise resilience: On analog platforms (Rydberg, photonic), QNG consistently improves iteration count and success probability; on deep/noisy superconducting hardware, full-matrix QNG loses advantage unless accompanied by error mitigation or circuit simplification (Dell'Anna et al., 27 Feb 2025, Wang et al., 2023).
Resource benefits: Random and reduced-coordinate QNG variants retain accuracy while lowering quantum call cost by orders of magnitude, making them highly attractive for large-parameter settings (Kolotouros et al., 2023). Hamiltonian-aware QNG requires only $t$ 4 shot cost per iteration, outperforming QNG in shot-constrained regimes (Shi et al., 18 Nov 2025).

Benchmarks for measuring the impact include iteration count to chemical accuracy, energy gap to ground state, final fidelity, steps required for n-digit precision, and spread/variance metrics over multiple random restarts.

5. Limitations, Open Challenges, and Best Practices

High resource costs: Full $t$ 5 estimation scales poorly; blockwise, random, and subsystem reweightings are essential for scaling to large circuits (Stokes et al., 2019, Shi et al., 18 Nov 2025).
Ill-conditioning: Singular or near-singular metrics arise near parameter-space singularities or plateaus; regularization ( $t$ 6) and monitoring of $t$ 7’s eigenvalues are required (Yamamoto, 2019).
Shot noise and device noise: Finite-sample errors propagate through $t$ 8, potentially destabilizing updates in deep/noisy circuits; approximate metrics and adaptively truncated inverses are advocated (Dell'Anna et al., 27 Feb 2025, Koczor et al., 2019).
Hyperparameter tuning: Momentum ( $t$ 9), learning rate ( $\theta_{t+1} = \theta_t - \eta F(\theta_t)^{-1}\nabla_\theta L(\theta_t),$ 0), and regularizer choices have material impact. Recommended ranges: $\theta_{t+1} = \theta_t - \eta F(\theta_t)^{-1}\nabla_\theta L(\theta_t),$ 1, $\theta_{t+1} = \theta_t - \eta F(\theta_t)^{-1}\nabla_\theta L(\theta_t),$ 2– $\theta_{t+1} = \theta_t - \eta F(\theta_t)^{-1}\nabla_\theta L(\theta_t),$ 3, $\theta_{t+1} = \theta_t - \eta F(\theta_t)^{-1}\nabla_\theta L(\theta_t),$ 4– $\theta_{t+1} = \theta_t - \eta F(\theta_t)^{-1}\nabla_\theta L(\theta_t),$ 5.
Monotonicity dilemma: Monotone-proper metrics allow contractivity (desired in state estimation), but dropping monotonicity accelerates optimization—non-monotonic QNG enables up to 40% reduction in iteration count at the expense of CPTP robustness (Sasaki et al., 2024, Miyahara, 21 Oct 2025).
Best practices: Use blockwise or weighted metrics for large or structured Hamiltonians; hybridize with momentum or adaptive step selection; warm-start to avoid initial plateaus; monitor metric condition and switch to Euclidean descent near singularities. Momentum-QNG and geodesic-corrected QNG are especially successful in mitigating local traps and plateaus (Borysenko et al., 2024, Halla, 2024).

6. Application Domains and Future Directions

Quantum natural-gradient optimizers are now foundational in VQE, QAOA, quantum neural network training, photonic circuit optimization, and quantum control, with further applicability to mixed-state and open-system optimization (Wang et al., 2023, Koczor et al., 2019). Ongoing development includes:

Hybrid adaptive algorithms: Line-search QNG, conjugate and geodesic-augmented variants, randomized/block sampling for scalability (Halla, 10 Jan 2025, Halla, 2024).
Hamiltonian-aware and subsystem-weighted metrics: Tailoring the optimizer to k-local, sparse, or otherwise structured problems (Shi et al., 18 Nov 2025, Shi et al., 7 Apr 2025).
Integration with error mitigation and noise-aware geometric metrics: For robust application on NISQ platforms (Koczor et al., 2019, Dell'Anna et al., 27 Feb 2025).
Exploration of non-monotonic and non-SLD Petz metrics: For accelerating convergence beyond traditional geometry (Miyahara, 21 Oct 2025).
Algorithmic hardware co-design: Leveraging photonic, superconducting, or Rydberg platforms for efficient QNG implementation (Wang et al., 2023, Dell'Anna et al., 27 Feb 2025).

Quantum natural-gradient optimizers represent a unifying information-geometric framework that enables algorithmic acceleration, robustness, and adaptability to the unique challenges of quantum system parameter landscapes, and continue to drive progress at the intersection of differential geometry, optimization theory, and quantum hardware (Stokes et al., 2019, Borysenko et al., 2024, Lisart-Liebermann et al., 23 Apr 2025).