Papers
Topics
Authors
Recent
Search
2000 character limit reached

NTK-aware Interpolation in Neural Representations

Updated 14 April 2026
  • NTK-aware Interpolation is a principled approach that decomposes NTK eigenvalue variance into interpretable architectural and parametric factors to control spectral bias.
  • Architectural levers like positional encoding, spherical normalization, and Hadamard modulation systematically reduce similarity masses and variance components in INR models.
  • By flattening the NTK spectrum, this method mitigates spectral bias and accelerates convergence, enabling more effective high-frequency signal recovery.

NTK-aware interpolation refers to a principled approach for designing and training implicit neural representations (INRs) with explicit control over the conditioning and eigenvalue spectrum of their Neural Tangent Kernel (NTK). By decomposing the NTK eigenvalue variance into interpretable architectural and parametric factors, NTK-aware methods enable the suppression of spectral bias and acceleration of convergence, particularly for high-frequency signal recovery tasks. This methodology underpins a unified understanding of how positional encoding, spherical normalization, and Hadamard modulation each contribute to improved NTK conditioning in coordinate-based MLPs (Ou et al., 17 Dec 2025).

1. NTK Formalism and Eigenvalue-Variance Decomposition

For a two-layer coordinate MLP, the output is

f(x;W)=amr=1mσ(wrϕ(x)),f(x; W) = \frac{a}{\sqrt{m}} \sum_{r=1}^m \sigma(w_r \cdot \phi(x)),

with ϕ(x)Rd\phi(x) \in \mathbb{R}^d denoting a fixed positional encoding, σ\sigma a pointwise ReLU, and aa the readout scale. The NTK at initialization is the n×nn\times n Gram matrix

Hij=a2mρijti,tj,H_{ij} = \frac{a^2}{m} \rho_{ij} \langle t_i, t_j \rangle,

where ρij=ϕ(xi)ϕ(xj)\rho_{ij} = \phi(x_i)^\top \phi(x_j) and ti=sipit_i = s_i \odot p_i, with sis_i encoding ReLU gating and spherical normalization and pip_i capturing Hadamard modulation.

Defining ϕ(x)Rd\phi(x) \in \mathbb{R}^d0 as ϕ(x)Rd\phi(x) \in \mathbb{R}^d1’s eigenvalues, the mean and variance are

ϕ(x)Rd\phi(x) \in \mathbb{R}^d2

Under mild regularity (bounded input norm ϕ(x)Rd\phi(x) \in \mathbb{R}^d3, hidden energy ϕ(x)Rd\phi(x) \in \mathbb{R}^d4, modulation scale ϕ(x)Rd\phi(x) \in \mathbb{R}^d5, self-similarities ≈ 1), the variance admits the proxy

ϕ(x)Rd\phi(x) \in \mathbb{R}^d6

where the similarity factors are:

  • ϕ(x)Rd\phi(x) \in \mathbb{R}^d7 (input similarity),
  • ϕ(x)Rd\phi(x) \in \mathbb{R}^d8 (hidden gating/normalization),
  • ϕ(x)Rd\phi(x) \in \mathbb{R}^d9 (modulation similarity),
  • σ\sigma0 (coupling).

This decomposition supports design-time and runtime diagnosis of NTK spectrum flatness and spectral bias in INR architectures (Ou et al., 17 Dec 2025).

2. Architectural Levers: Impact on NTK Similarity Masses

Positional Encoding (PE): Utilizing Fourier-feature encodings σ\sigma1, with σ\sigma2, reshapes input similarity σ\sigma3. Lemma 3.1 shows

σ\sigma4

and as σ\sigma5, this mass approaches σ\sigma6 (baseline). By monotonicity, shrinking off-diagonal σ\sigma7 directly lowers σ\sigma8 (Corollary C.2).

Spherical Normalization (SP): Introducing σ\sigma9 enforces aa0 (aa1 vs aa2 for standard ReLU MLP), reducing the energy factor approximately by aa3 (Corollary 3.4, C.3). The Top-K variant further contracts energy-weighted hidden similarity almost quadratically in aa4 (Theorem D.7).

Hadamard Modulation: With coefficient aa5, aa6, and aa7 modulation, the factors aa8 off-diagonal. Any nontrivial modulation with aa9 for n×nn\times n0 further multiplies down n×nn\times n1, delivering additional variance reduction (Corollaries 3.5, C.4).

Mechanism Dominant Factor(s) Reduced Variance Impact
Positional Encoding n×nn\times n2 Mass approaches n×nn\times n3, reducing n×nn\times n4
Spherical Normalization n×nn\times n5, n×nn\times n6 Energy factor contracts by n×nn\times n7 vs baseline
Hadamard Modulation n×nn\times n8, n×nn\times n9 Multiplies down variance by Hij=a2mρijti,tj,H_{ij} = \frac{a^2}{m} \rho_{ij} \langle t_i, t_j \rangle,0 for Hij=a2mρijti,tj,H_{ij} = \frac{a^2}{m} \rho_{ij} \langle t_i, t_j \rangle,1

3. Unified Interpretation and Spectral Bias Mitigation

Each architectural mechanism shrinks one or more of the similarity and scaling factors, contracting the overall NTK variance multiplicatively:

Hij=a2mρijti,tj,H_{ij} = \frac{a^2}{m} \rho_{ij} \langle t_i, t_j \rangle,2

When Hij=a2mρijti,tj,H_{ij} = \frac{a^2}{m} \rho_{ij} \langle t_i, t_j \rangle,3 is smaller, the NTK spectrum is flatter, leading to reduced spectral bias and more uniform convergence across frequency modes. Improved NTK conditioning thus facilitates faster, more stable recovery of high-frequency signal components and higher-fidelity interpolation (Ou et al., 17 Dec 2025). This decomposition renders diverse INR architectures commensurable through their impact on NTK eigenvalue dispersion.

4. NTK-Aware INR Interpolation: Algorithmic Guidelines

Network Architecture:

  • Input: Random Fourier features Hij=a2mρijti,tj,H_{ij} = \frac{a^2}{m} \rho_{ij} \langle t_i, t_j \rangle,4 with adjustable bandwidth Hij=a2mρijti,tj,H_{ij} = \frac{a^2}{m} \rho_{ij} \langle t_i, t_j \rangle,5.
  • Hidden layers: linear Hij=a2mρijti,tj,H_{ij} = \frac{a^2}{m} \rho_{ij} \langle t_i, t_j \rangle,6 ReLU Hij=a2mρijti,tj,H_{ij} = \frac{a^2}{m} \rho_{ij} \langle t_i, t_j \rangle,7 spherical-norm (or Top-K norm) Hij=a2mρijti,tj,H_{ij} = \frac{a^2}{m} \rho_{ij} \langle t_i, t_j \rangle,8 Hadamard modulation (elementwise product with Hij=a2mρijti,tj,H_{ij} = \frac{a^2}{m} \rho_{ij} \langle t_i, t_j \rangle,9).
  • Output: Linear readout ρij=ϕ(xi)ϕ(xj)\rho_{ij} = \phi(x_i)^\top \phi(x_j)0 with ρij=ϕ(xi)ϕ(xj)\rho_{ij} = \phi(x_i)^\top \phi(x_j)1 fixed at initialization.

Initialization:

  • Weights ρij=ϕ(xi)ϕ(xj)\rho_{ij} = \phi(x_i)^\top \phi(x_j)2, ρij=ϕ(xi)ϕ(xj)\rho_{ij} = \phi(x_i)^\top \phi(x_j)3, ρij=ϕ(xi)ϕ(xj)\rho_{ij} = \phi(x_i)^\top \phi(x_j)4 bounded (e.g., random ρij=ϕ(xi)ϕ(xj)\rho_{ij} = \phi(x_i)^\top \phi(x_j)5).
  • Small ρij=ϕ(xi)ϕ(xj)\rho_{ij} = \phi(x_i)^\top \phi(x_j)6 and ρij=ϕ(xi)ϕ(xj)\rho_{ij} = \phi(x_i)^\top \phi(x_j)7 to ensure NTK regime (ρij=ϕ(xi)ϕ(xj)\rho_{ij} = \phi(x_i)^\top \phi(x_j)8).
  • Width ρij=ϕ(xi)ϕ(xj)\rho_{ij} = \phi(x_i)^\top \phi(x_j)9 polynomial in ti=sipit_i = s_i \odot p_i0 for kernel stability: ti=sipit_i = s_i \odot p_i1.

Training:

  • Learning rate ti=sipit_i = s_i \odot p_i2 for stability; linear convergence at ti=sipit_i = s_i \odot p_i3.
  • Gradient descent or small-batch SGD; early stopping leverages uniform mode decay.
  • ti=sipit_i = s_i \odot p_i4 weight decay or spectral-norm regularization is optional for extra stability.

5. Monitoring NTK Eigenvalue Variance in Practice

The NTK variance ti=sipit_i = s_i \odot p_i5 can be estimated at initialization and throughout training. For computational efficiency, the NTK is typically computed over ti=sipit_i = s_i \odot p_i6 coordinate subsamples. The following pseudocode, as presented in (Ou et al., 17 Dec 2025), details this procedure:

ti=sipit_i = s_i \odot p_i8

Tracking ti=sipit_i = s_i \odot p_i7 across training epochs for various architectures (base MLP, +PE, +Norm, +Hada) empirically validates that each augmentation sequentially lowers NTK variance, flattens the spectrum, and produces faster, more stable, and higher-quality interpolation of continuous functions (Ou et al., 17 Dec 2025).

6. Empirical Implications and Outlook

Experiments confirm the predicted variance reductions and indicate that each architectural intervention incrementally improves NTK conditioning and convergence properties. A plausible implication is that further architectural innovations or regularization strategies could be systematically evaluated through their effect on the four-factor variance decomposition. NTK-aware interpolation thus supplies both a diagnostic and a design framework for future advances in implicit neural representations (Ou et al., 17 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to NTK-aware Interpolation.