Dy-NTK: Dynamic Neural Tangent Kernel

Updated 26 October 2025

Dy-NTK is a dynamic framework that redefines the traditional NTK regime by leveraging spectral decomposition to identify and exploit informative parameter directions.
It integrates both linear (NTK) and quadratic (QuadNTK) components to efficiently learn mixed dense low-degree and sparse high-degree representations with reduced sample complexity.
By applying targeted regularization, Dy-NTK controls parameter dynamics along beneficial subspaces, ensuring improved convergence, stability, and generalization.

Dy-NTK (Dynamic Neural Tangent Kernel) broadly refers to mechanisms for escaping or extending the classical NTK regime by controlling, adapting, or exploiting the dynamics and structure of the kernel induced by neural network parameter evolution. The foundational setting is the “lazy training” regime, where the network behaves as a linearized model at initialization, represented by a fixed NTK. While the NTK captures optimality for certain function classes—particularly dense, low-degree polynomials—it imposes fundamental limits for feature learning, representation adaptation, and sample complexity in broader settings. Dy-NTK approaches combine spectral, optimization, and architectural tools to exploit higher-order dynamics, improve sample efficiency, and adapt to richer targets through nontrivial directions in parameter space.

1. Spectral Decomposition of the NTK and Identification of Informative Directions

A central tenet of Dy-NTK methodology is the spectral decomposition of the network’s feature covariance (essentially, the NTK Gram matrix) at initialization. Denoting the population NTK feature covariance as $\Sigma$ , its eigendecomposition

$\Sigma = Q_1 \Lambda_1 Q_1^\top + Q_2 \Lambda_2 Q_2^\top + Q_3 \Lambda_3 Q_3^\top$

distinguishes three classes of directions:

$Q_1$ : Top eigenvectors linked to large eigenvalues; these correspond to “informative,” low-degree polynomial structures that the classical NTK fits well.
$Q_2$ : Intermediate (medium-eigenvalue) directions, in which parameter movement amplifies function outputs on unseen data and corresponds to “bad” generalization behavior.
$Q_3$ : Small-eigenvalue directions; movement here does not adversely affect out-of-sample NTK generalization, yielding “good” directions for escaping the lazy regime.

This fine-grained spectral partition is essential to Dy-NTK. The methodology exploits $Q_3$ for learning target components that are otherwise inexpressible or sample-inefficient in the standard NTK regime. The analysis leverages spherical harmonics to relate the NTK spectrum to the degree of polynomials it can capture, ensuring theoretical control in high-dimensional settings (Nichani et al., 2022).

2. Joint Utilization of First- and Second-Order Terms: NTK and QuadNTK

The Dy-NTK approach integrates both the linearized NTK (first-order Taylor expansion around initialization) and the quadratic expansion (“QuadNTK”) of the network function. Previous work established that:

The NTK is minimax-optimal for learning dense low-degree polynomials but fails for sparse high-degree functions.
The QuadNTK enables efficient learning of sparse high-degree polynomials (sample complexity $d^k$ for degree $k+1$ , compared to $d^{k+1}$ for NTK) but cannot capture dense structures.

Dy-NTK achieves simultaneous learning of target functions of the form

$f^*(x) = f_k(x) + f_{\text{sp}}(x)$

where $f_k$ is a dense degree- $\leq k$ polynomial and $f_{\text{sp}}$ is a sparse degree- $(k+1)$ component.

The construction involves separate solutions for the two additive constituents:

$W_L$ : Parameters whose linear term fits $f_k$ , relying on the informative subspace $Q_1$ for sample-efficient generalization.
$W_Q$ : Parameters for the quadratic contribution, constructed via randomized sign matrices $S$ so that $f_Q(x;W_Q)$ captures $f_{\text{sp}}$ , while its linear projection onto $Q_1$ remains negligible, avoiding destructive interference.

The composite solution $W^* = W_L + W_Q S$ leverages randomization and spectral orthogonality, ensuring each component almost exclusively fits its respective target portion (Nichani et al., 2022).

3. Regularization for Controlled Parameter Dynamics

To ensure convergence to solutions with controlled generalization, particularly under finite width and optimization non-convexity, the Dy-NTK methodology introduces composite regularization:

$R_1$ penalizes movement in the “bad” $Q_2$ directions prone to out-of-sample instability.
$R_2$ is a standard $\ell_2$ penalty in the “informative” $Q_1$ directions, moderating the parameters involved in fitting $f_k$ .
$R_3$ penalizes parameter drift in corresponding “bad” neuron subspaces ( $V_2$ ), echoing the $Q_2$ partition in activation space.
$R_4$ is a higher-order norm penalty (e.g., $\ell_{2,4}$ ) essential to guarantee proper control of the neural tangent generalization error.

These regularizers are combined in the empirical objective

$L_\lambda(W) = \text{Empirical Loss}(W) + \lambda_1 R_1(W) + \lambda_2 R_2(W) + \lambda_3 R_3(W) + \lambda_4 R_4(W)$

Gradient descent on this regularized loss landscape is shown (by careful geometric and Hessian analysis) to converge globally—with critical points tightly coupled to small population loss—provided movement remains confined to the “good” $Q_3$ and $V_3$ directions (Nichani et al., 2022).

4. Sample Complexity and Generalization Guarantees

A primary advantage of Dy-NTK is reduced sample complexity. While the NTK alone requires $d^{k+1}$ samples to fit a degree- $(k+1)$ polynomial, and QuadNTK alone is limited to sparse structure, Dy-NTK achieves order- $d^k$ sample complexity for mixed dense/sparse targets:

The $Q_1$ mechanism ensures the low-degree dense part is captured with standard NTK optimal rates.
The $Q_3$ subspace permits the quadratic term to capture high-degree sparse signals at the sample complexity afforded by QuadNTK, without corruption of generalization.

Global convergence and generalization bounds are established under conditions on the NTK eigenspectrum and regularizer coefficients (e.g., error for the NTK term is $\widetilde{O}(d^k/n)$ , while the quadratic term achieves error $\widetilde{O}(d^k/\sqrt{m})$ for rank- $R$ sparsity) (Nichani et al., 2022).

5. Theoretical and Methodological Significance

Dy-NTK methodology demonstrates how to “escape” the static NTK regime not merely by increasing width or depth but by rigorously identifying, via spectral tools, safe directions in parameter space. It unifies the first- and second-order Taylor regimes and provides:

An explicit, spectral prescription for designing regularizers to enforce beneficial dynamics, grounded in population-level analysis.
A concrete mechanism to disambiguate good and bad directions, overcoming the pitfall of moving indiscriminately in high-curvature or low-signal directions.
A solution with improved generalization for function classes otherwise elusive to strictly NTK-based or strictly second-order approaches.
The use of random sign matrices and null-space projections to ensure orthogonality and preserve separation of concerns between linear and quadratic terms.

This approach draws clear conceptual distinction from both pure “lazy training” (fixed kernel, minimal feature learning) and uncontrolled end-to-end deep learning (where lack of directionality may impair generalization or optimization landscape properties).

6. Broader Context and Implications

The Dy-NTK analytic framework generalizes to scenarios beyond dense-plus-sparse polynomial learning. A plausible implication is that any class of functions with mixed or hierarchically structured components may benefit from similar spectral decompositions and dynamic, regularizer-influenced escape from the kernel regime. The approach has motivated broader investigation into data-dependent spectral methods, regularized optimization subspaces, and the integration of higher-order dynamics in modern gradient-based neural network training.

Current limitations include its restriction to the two-layer setting and polynomial targets; extension to deeper architectures and broader functional classes remains open. There are also connections to convex reformulations and kernel learning frameworks (e.g., iteratively reweighted group lasso or multiple kernel learning), and to recent empirical findings demonstrating the need for dynamic (rather than static or purely lazy) NTK frameworks in sequential or nonstationary learning scenarios (Liu et al., 21 Jul 2025, Wenger et al., 2023).

Summary Table: Dy-NTK Structural Components

Component	Role	Spectral Subspace
$Q_1$ (informative)	Fits dense low-degree signals (NTK)	Top eigenvalue subspace
$Q_2$ (bad)	Avoided: leads to out-of-sample instability	Intermediate eigenvalues
$Q_3$ (good, null)	Exploited for quadratic/sparse signals	Small eigenvalue subspace
$V_2$ (bad neuron)	Regularized against for generalization	Neuron space

Dy-NTK thus synthesizes rigorous spectral characterizations, polynomial capacity theory, quadratic expansion, and structured regularization to construct architectures and optimization paths that systematically escape the weaknesses of standard NTK approaches without forfeiting their established guarantees.