Time-Evolving Natural Gradient (TENG)

Updated 20 December 2025

TENG is a framework that leverages the natural gradient and variational principles to evolve model parameters in time-dependent learning problems.
It unifies time-dependent variational approaches with numerical time-stepping schemes, enabling accurate solutions for PDEs and robust generative modeling.
The method integrates explicit boundary enforcement and particle-based updates, achieving orders of magnitude error reduction compared to traditional techniques.

The Time-Evolving Natural Gradient (TENG) framework provides a theoretically principled approach for evolving model parameters in time-dependent machine learning problems by leveraging the geometry of the parameter manifold induced by the Fisher information or kernel-based metrics. TENG generalizes and unifies time-dependent variational principles, optimization-based time integration, and natural gradient descent via explicit projections in Hilbert or exponential-family spaces. The methodology enables high-precision solutions for partial differential equations (PDEs) with deep neural networks, guides time-varying generative models, and allows particle-based formulations on exponential-family manifolds, without reliance on Markov Chain Monte Carlo (MCMC) sampling. TENG supports first- and second-order explicit integrators and is extensible to general boundary conditions and a broad class of equations.

1. Theoretical Foundations and Natural Gradient Principle

TENG arises from combining the Dirac–Frenkel Time-Dependent Variational Principle (TDVP) and Riemannian natural gradient descent on parameter manifolds. Consider a time-dependent PDE such as

$\partial_t u(x, t) = \mathcal{L}u(x, t),\quad u(x,0) = u_0(x),$

with a neural-network approximation $u_{\theta}(x, t)$ parameterized by $\theta \in \mathbb{R}^{N_p}$ . TDVP seeks an evolution law for $\theta(t)$ so that the induced solution trajectory best approximates the true flow in the $L^2$ sense. This yields the normal equations:

$G(\theta)\,\dot{\theta} = J^T \mathcal{L}u_{\theta},$

where $J$ is the Jacobian with respect to parameters and $G$ is a Gram matrix representing the local $L^2$ metric or Fisher information in probabilistic settings (Chen et al., 2024, Liu et al., 11 Feb 2025). The natural gradient in parameter space,

$\Delta \theta = -\eta F(\theta)^{-1} \nabla_{\theta} L(\theta),$

corresponds to the steepest descent in the Riemannian metric defined by $F(\theta)$ and is optimal under information geometry.

In probabilistic generative modeling, parameter evolution may be projected onto an exponential-family manifold $M(T) = \{\eta \in \mathbb{R}^k : A(\eta) < \infty\}$ , with Fisher-Rao metric $F(\eta) = \operatorname{Cov}_{q(\cdot;\eta)}[T(X)]$ (Liu et al., 11 Feb 2025). The time-continuous natural gradient for minimizing $\mathrm{KL}(p \| q(\cdot;\eta))$ is governed by the ODE:

$\frac{d\eta}{dt} = F(\eta)^{-1} \left( \mathbb{E}_p[T] - \mathbb{E}_{q}[T] \right).$

2. Numerical Time-Stepping Schemes

TENG realizes its parameter updates within standard numerical time-integration frameworks. At each time step, a "target" function $u_{\mathrm{target}}$ is defined via forward Euler or higher-order combinations, then the parameters are updated by projecting the functional discrepancy onto the neural tangent plane:

TENG-Euler (first order):

$u_{\rm target}(x) = u_{\theta_t}(x) + \Delta t\, \mathcal{L}u_{\theta_t}(x),$

and $\theta_{t+\Delta t}$ is obtained by minimizing $\|u_{\theta} - u_{\rm target}\|_{L^2}$ using natural-gradient steps.

TENG-Heun (second order):
- Predictor: $u_{\rm pred} = u_\theta + \Delta t ( \partial_t u_\theta - \mathcal{N}[u_\theta] )$ , $\theta_{\rm pred} = \text{NGStep}( \theta_n, t_n; u_{\rm pred} )$
- Corrector: $u_\mathrm{tar} = u_\theta + (\Delta t / 2) \left[ \ldots \right]$ , $\theta_{n+1} = \text{NGStep}( \theta_n, t_n; u_{\rm tar} )$ .

Higher-order (e.g., Runge–Kutta) variants are directly analogous and benefit from strong local optimization at each sub-stage (Chen et al., 2024, He et al., 13 Dec 2025). The natural gradient step is approximated using the Gauss–Newton or kernelized tangent-space structure (He et al., 13 Dec 2025).

3. Boundary Condition Enforcement in PDE Solvers

TENG++ (the extension of TENG) incorporates general boundary conditions, notably Dirichlet, by augmenting the loss function with explicit penalty terms:

$L_{BC}(\theta, t) = \lambda \int_{\partial \Omega} |u_{\theta}(x,t) - g(x,t)|^2 dS_x,$

where $\lambda$ balances domain versus boundary enforcement (He et al., 13 Dec 2025). Residuals from boundary and interior collocation are concatenated and included in the Jacobian for the Gauss–Newton update. Neumann and mixed conditions are handled analogously by constructing suitable loss terms and augmenting the residual vector. Table summarizing key penalty terms:

BC Type	Loss Penalty Form	Residual Contribution
Dirichlet	$\lambda \int_{\partial\Omega} \|u_\theta-g\|^2$	$\sqrt{\lambda}(u_\theta-g)$
Neumann	$\mu \int_{\partial\Omega} \|\partial_n u_\theta-h\|^2$	$\sqrt{\mu}(\partial_n u_\theta-h)$
Mixed	Both Dirichlet and Neumann	Both terms above

This results in stable, accurate enforcement of boundary constraints with tunable weightings.

4. Particle-based NGD for Generative Modeling

TENG extends to implicit time-varying generative models via particle-based and kernel-based natural gradient flows. For a generator $X_t = g(Z, t; \theta)$ , the evolution is projected onto an exponential-family manifold through explicit minimization in function space. The time score $s_t(x) = \partial_t \log q_{g_t}(x)$ is projected onto the tangent space:

$\zeta(t_0) = -\left[ \int \lambda_{t_0}(t)\, \operatorname{Cov}[T(X_t)]\,dt \right]^{-1} \int \partial_t \lambda_{t_0}(t)\, \mathbb{E}[T(X_t)]\,dt.$

Particle-based updates can use

Kernel NGD (KiNG): Search for vector fields in a specified RKHS with closed-form expressions for drift updates.
Neural Tangent Kernel NGD (ntKiNG): Utilize the network's NTK for parameter evolution (Liu et al., 11 Feb 2025).

This approach enables explicit, closed-form natural-gradient updates for both parameterized and nonparametric models, with robust empirical performance on high-dimensional and structured data.

5. Error Analysis, Benchmarks, and Computational Performance

Machine-precision step-wise convergence is a hallmark of TENG. In canonical PDEs, TENG-stepper drives the loss $L^2$ residual to $10^{-14}$ in $\sim7$ inner iterations, far surpassing OBTI-Adam/LBFGS ( $\sim10^{-7}$ in hundreds of iterations) (Chen et al., 2024). TENG-Euler scales linearly with step size, while TENG-Heun exhibits quadratic convergence to global errors often below $10^{-6}$ for moderate $\Delta t$ . For example, on 2D heat and Burgers' equations, TENG-Heun achieves $1.6\times10^{-6}$ to $2.6\times10^{-6}$ global errors, outperforming traditional PINN and collocation-based approaches.

Benchmarks further establish that TENG variants deliver up to $2-3$ orders of magnitude lower error than state-of-the-art PINN or OBTI-based methods, at comparable computational cost on modern accelerators (NVIDIA V100: $2$–$5$ h for 2D problems).

6. Implementation, Limitations, and Extensions

Typical architectures employ feed-forward neural networks (7 layers, width 40, tanh activations, periodic sine/cosine embeddings as needed), with total parameter count $N_p \sim 10^4$ (Chen et al., 2024). Inner steps in function space subsample parameters ( $1\,000$ – $2\,000$ per least-squares solve) to regularize and maintain tractable computation. Learning rates are explicitly scheduled in function space.

TENG complexity scales as $O(N_{\mathrm{it}}\, \mathrm{Cost}_{lstsq}\,T/\Delta t)$ , with $\mathrm{Cost}_{lstsq}=O(N_s N_p^2)$ per dense iteration. Subsampling significantly reduces wall-clock time.

Limitations include:

The computational cost of high-dimensional least squares, mitigated by subsampling and fast solvers.
The need for adaptation to non-periodic domains or stiff equations.
Potential for increased inner-iteration count or smaller $\Delta t$ in very stiff PDEs.

Extensions involve modeling vector-valued fields, adaptive time-stepping, integration with global-in-time PINN frameworks, and leveraging sparse or randomized solvers for scale-out to larger problems (Chen et al., 2024, He et al., 13 Dec 2025).

7. Significance and Outlook

TENG and its variants (notably TENG++ and particle-based NGD flows) unify variational, geometric, and optimization-based approaches for evolving neural network parameters in time-dependent settings. The framework integrates Hilbert and manifold geometry, provides data-efficient step-wise projections, and enables machine-precision or near-machine-precision accuracy in practice. It removes the need for MCMC in generative learning on intractable manifolds and achieves robust, high-accuracy enforcement of general boundary conditions for neural PDE solvers. The methods are readily extensible to new classes of governing equations, diverse boundary conditions, and complex generative models, positioning TENG as a foundational tool in geometric algorithms for scientific and probabilistic machine learning (Chen et al., 2024, He et al., 13 Dec 2025, Liu et al., 11 Feb 2025).