NTK-aware Interpolation in Neural Representations

Updated 14 April 2026

NTK-aware Interpolation is a principled approach that decomposes NTK eigenvalue variance into interpretable architectural and parametric factors to control spectral bias.
Architectural levers like positional encoding, spherical normalization, and Hadamard modulation systematically reduce similarity masses and variance components in INR models.
By flattening the NTK spectrum, this method mitigates spectral bias and accelerates convergence, enabling more effective high-frequency signal recovery.

NTK-aware interpolation refers to a principled approach for designing and training implicit neural representations (INRs) with explicit control over the conditioning and eigenvalue spectrum of their Neural Tangent Kernel (NTK). By decomposing the NTK eigenvalue variance into interpretable architectural and parametric factors, NTK-aware methods enable the suppression of spectral bias and acceleration of convergence, particularly for high-frequency signal recovery tasks. This methodology underpins a unified understanding of how positional encoding, spherical normalization, and Hadamard modulation each contribute to improved NTK conditioning in coordinate-based MLPs (Ou et al., 17 Dec 2025).

1. NTK Formalism and Eigenvalue-Variance Decomposition

For a two-layer coordinate MLP, the output is

$f(x; W) = \frac{a}{\sqrt{m}} \sum_{r=1}^m \sigma(w_r \cdot \phi(x)),$

with $\phi(x) \in \mathbb{R}^d$ denoting a fixed positional encoding, $\sigma$ a pointwise ReLU, and $a$ the readout scale. The NTK at initialization is the $n\times n$ Gram matrix

$H_{ij} = \frac{a^2}{m} \rho_{ij} \langle t_i, t_j \rangle,$

where $\rho_{ij} = \phi(x_i)^\top \phi(x_j)$ and $t_i = s_i \odot p_i$ , with $s_i$ encoding ReLU gating and spherical normalization and $p_i$ capturing Hadamard modulation.

Defining $\phi(x) \in \mathbb{R}^d$ 0 as $\phi(x) \in \mathbb{R}^d$ 1’s eigenvalues, the mean and variance are

$\phi(x) \in \mathbb{R}^d$ 2

Under mild regularity (bounded input norm $\phi(x) \in \mathbb{R}^d$ 3, hidden energy $\phi(x) \in \mathbb{R}^d$ 4, modulation scale $\phi(x) \in \mathbb{R}^d$ 5, self-similarities ≈ 1), the variance admits the proxy

$\phi(x) \in \mathbb{R}^d$ 6

where the similarity factors are:

$\phi(x) \in \mathbb{R}^d$ 7 (input similarity),
$\phi(x) \in \mathbb{R}^d$ 8 (hidden gating/normalization),
$\phi(x) \in \mathbb{R}^d$ 9 (modulation similarity),
$\sigma$ 0 (coupling).

This decomposition supports design-time and runtime diagnosis of NTK spectrum flatness and spectral bias in INR architectures (Ou et al., 17 Dec 2025).

2. Architectural Levers: Impact on NTK Similarity Masses

Positional Encoding (PE): Utilizing Fourier-feature encodings $\sigma$ 1, with $\sigma$ 2, reshapes input similarity $\sigma$ 3. Lemma 3.1 shows

$\sigma$ 4

and as $\sigma$ 5, this mass approaches $\sigma$ 6 (baseline). By monotonicity, shrinking off-diagonal $\sigma$ 7 directly lowers $\sigma$ 8 (Corollary C.2).

Spherical Normalization (SP): Introducing $\sigma$ 9 enforces $a$ 0 ( $a$ 1 vs $a$ 2 for standard ReLU MLP), reducing the energy factor approximately by $a$ 3 (Corollary 3.4, C.3). The Top-K variant further contracts energy-weighted hidden similarity almost quadratically in $a$ 4 (Theorem D.7).

Hadamard Modulation: With coefficient $a$ 5, $a$ 6, and $a$ 7 modulation, the factors $a$ 8 off-diagonal. Any nontrivial modulation with $a$ 9 for $n\times n$ 0 further multiplies down $n\times n$ 1, delivering additional variance reduction (Corollaries 3.5, C.4).

Mechanism	Dominant Factor(s) Reduced	Variance Impact
Positional Encoding	$n\times n$ 2	Mass approaches $n\times n$ 3, reducing $n\times n$ 4
Spherical Normalization	$n\times n$ 5, $n\times n$ 6	Energy factor contracts by $n\times n$ 7 vs baseline
Hadamard Modulation	$n\times n$ 8, $n\times n$ 9	Multiplies down variance by $H_{ij} = \frac{a^2}{m} \rho_{ij} \langle t_i, t_j \rangle,$ 0 for $H_{ij} = \frac{a^2}{m} \rho_{ij} \langle t_i, t_j \rangle,$ 1

3. Unified Interpretation and Spectral Bias Mitigation

Each architectural mechanism shrinks one or more of the similarity and scaling factors, contracting the overall NTK variance multiplicatively:

$H_{ij} = \frac{a^2}{m} \rho_{ij} \langle t_i, t_j \rangle,$ 2

When $H_{ij} = \frac{a^2}{m} \rho_{ij} \langle t_i, t_j \rangle,$ 3 is smaller, the NTK spectrum is flatter, leading to reduced spectral bias and more uniform convergence across frequency modes. Improved NTK conditioning thus facilitates faster, more stable recovery of high-frequency signal components and higher-fidelity interpolation (Ou et al., 17 Dec 2025). This decomposition renders diverse INR architectures commensurable through their impact on NTK eigenvalue dispersion.

4. NTK-Aware INR Interpolation: Algorithmic Guidelines

Network Architecture:

Input: Random Fourier features $H_{ij} = \frac{a^2}{m} \rho_{ij} \langle t_i, t_j \rangle,$ 4 with adjustable bandwidth $H_{ij} = \frac{a^2}{m} \rho_{ij} \langle t_i, t_j \rangle,$ 5.
Hidden layers: linear $H_{ij} = \frac{a^2}{m} \rho_{ij} \langle t_i, t_j \rangle,$ 6 ReLU $H_{ij} = \frac{a^2}{m} \rho_{ij} \langle t_i, t_j \rangle,$ 7 spherical-norm (or Top-K norm) $H_{ij} = \frac{a^2}{m} \rho_{ij} \langle t_i, t_j \rangle,$ 8 Hadamard modulation (elementwise product with $H_{ij} = \frac{a^2}{m} \rho_{ij} \langle t_i, t_j \rangle,$ 9).
Output: Linear readout $\rho_{ij} = \phi(x_i)^\top \phi(x_j)$ 0 with $\rho_{ij} = \phi(x_i)^\top \phi(x_j)$ 1 fixed at initialization.

Initialization:

Weights $\rho_{ij} = \phi(x_i)^\top \phi(x_j)$ 2, $\rho_{ij} = \phi(x_i)^\top \phi(x_j)$ 3, $\rho_{ij} = \phi(x_i)^\top \phi(x_j)$ 4 bounded (e.g., random $\rho_{ij} = \phi(x_i)^\top \phi(x_j)$ 5).
Small $\rho_{ij} = \phi(x_i)^\top \phi(x_j)$ 6 and $\rho_{ij} = \phi(x_i)^\top \phi(x_j)$ 7 to ensure NTK regime ( $\rho_{ij} = \phi(x_i)^\top \phi(x_j)$ 8).
Width $\rho_{ij} = \phi(x_i)^\top \phi(x_j)$ 9 polynomial in $t_i = s_i \odot p_i$ 0 for kernel stability: $t_i = s_i \odot p_i$ 1.

Training:

Learning rate $t_i = s_i \odot p_i$ 2 for stability; linear convergence at $t_i = s_i \odot p_i$ 3.
Gradient descent or small-batch SGD; early stopping leverages uniform mode decay.
$t_i = s_i \odot p_i$ 4 weight decay or spectral-norm regularization is optional for extra stability.

5. Monitoring NTK Eigenvalue Variance in Practice

The NTK variance $t_i = s_i \odot p_i$ 5 can be estimated at initialization and throughout training. For computational efficiency, the NTK is typically computed over $t_i = s_i \odot p_i$ 6 coordinate subsamples. The following pseudocode, as presented in (Ou et al., 17 Dec 2025), details this procedure:

$t_i = s_i \odot p_i$ 8

Tracking $t_i = s_i \odot p_i$ 7 across training epochs for various architectures (base MLP, +PE, +Norm, +Hada) empirically validates that each augmentation sequentially lowers NTK variance, flattens the spectrum, and produces faster, more stable, and higher-quality interpolation of continuous functions (Ou et al., 17 Dec 2025).

6. Empirical Implications and Outlook

Experiments confirm the predicted variance reductions and indicate that each architectural intervention incrementally improves NTK conditioning and convergence properties. A plausible implication is that further architectural innovations or regularization strategies could be systematically evaluated through their effect on the four-factor variance decomposition. NTK-aware interpolation thus supplies both a diagnostic and a design framework for future advances in implicit neural representations (Ou et al., 17 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Understanding NTK Variance in Implicit Neural Representations (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to NTK-aware Interpolation.