NTK-aware Interpolation in Neural Representations
- NTK-aware Interpolation is a principled approach that decomposes NTK eigenvalue variance into interpretable architectural and parametric factors to control spectral bias.
- Architectural levers like positional encoding, spherical normalization, and Hadamard modulation systematically reduce similarity masses and variance components in INR models.
- By flattening the NTK spectrum, this method mitigates spectral bias and accelerates convergence, enabling more effective high-frequency signal recovery.
NTK-aware interpolation refers to a principled approach for designing and training implicit neural representations (INRs) with explicit control over the conditioning and eigenvalue spectrum of their Neural Tangent Kernel (NTK). By decomposing the NTK eigenvalue variance into interpretable architectural and parametric factors, NTK-aware methods enable the suppression of spectral bias and acceleration of convergence, particularly for high-frequency signal recovery tasks. This methodology underpins a unified understanding of how positional encoding, spherical normalization, and Hadamard modulation each contribute to improved NTK conditioning in coordinate-based MLPs (Ou et al., 17 Dec 2025).
1. NTK Formalism and Eigenvalue-Variance Decomposition
For a two-layer coordinate MLP, the output is
with denoting a fixed positional encoding, a pointwise ReLU, and the readout scale. The NTK at initialization is the Gram matrix
where and , with encoding ReLU gating and spherical normalization and capturing Hadamard modulation.
Defining 0 as 1’s eigenvalues, the mean and variance are
2
Under mild regularity (bounded input norm 3, hidden energy 4, modulation scale 5, self-similarities ≈ 1), the variance admits the proxy
6
where the similarity factors are:
- 7 (input similarity),
- 8 (hidden gating/normalization),
- 9 (modulation similarity),
- 0 (coupling).
This decomposition supports design-time and runtime diagnosis of NTK spectrum flatness and spectral bias in INR architectures (Ou et al., 17 Dec 2025).
2. Architectural Levers: Impact on NTK Similarity Masses
Positional Encoding (PE): Utilizing Fourier-feature encodings 1, with 2, reshapes input similarity 3. Lemma 3.1 shows
4
and as 5, this mass approaches 6 (baseline). By monotonicity, shrinking off-diagonal 7 directly lowers 8 (Corollary C.2).
Spherical Normalization (SP): Introducing 9 enforces 0 (1 vs 2 for standard ReLU MLP), reducing the energy factor approximately by 3 (Corollary 3.4, C.3). The Top-K variant further contracts energy-weighted hidden similarity almost quadratically in 4 (Theorem D.7).
Hadamard Modulation: With coefficient 5, 6, and 7 modulation, the factors 8 off-diagonal. Any nontrivial modulation with 9 for 0 further multiplies down 1, delivering additional variance reduction (Corollaries 3.5, C.4).
| Mechanism | Dominant Factor(s) Reduced | Variance Impact |
|---|---|---|
| Positional Encoding | 2 | Mass approaches 3, reducing 4 |
| Spherical Normalization | 5, 6 | Energy factor contracts by 7 vs baseline |
| Hadamard Modulation | 8, 9 | Multiplies down variance by 0 for 1 |
3. Unified Interpretation and Spectral Bias Mitigation
Each architectural mechanism shrinks one or more of the similarity and scaling factors, contracting the overall NTK variance multiplicatively:
2
When 3 is smaller, the NTK spectrum is flatter, leading to reduced spectral bias and more uniform convergence across frequency modes. Improved NTK conditioning thus facilitates faster, more stable recovery of high-frequency signal components and higher-fidelity interpolation (Ou et al., 17 Dec 2025). This decomposition renders diverse INR architectures commensurable through their impact on NTK eigenvalue dispersion.
4. NTK-Aware INR Interpolation: Algorithmic Guidelines
Network Architecture:
- Input: Random Fourier features 4 with adjustable bandwidth 5.
- Hidden layers: linear 6 ReLU 7 spherical-norm (or Top-K norm) 8 Hadamard modulation (elementwise product with 9).
- Output: Linear readout 0 with 1 fixed at initialization.
Initialization:
- Weights 2, 3, 4 bounded (e.g., random 5).
- Small 6 and 7 to ensure NTK regime (8).
- Width 9 polynomial in 0 for kernel stability: 1.
Training:
- Learning rate 2 for stability; linear convergence at 3.
- Gradient descent or small-batch SGD; early stopping leverages uniform mode decay.
- 4 weight decay or spectral-norm regularization is optional for extra stability.
5. Monitoring NTK Eigenvalue Variance in Practice
The NTK variance 5 can be estimated at initialization and throughout training. For computational efficiency, the NTK is typically computed over 6 coordinate subsamples. The following pseudocode, as presented in (Ou et al., 17 Dec 2025), details this procedure:
8
Tracking 7 across training epochs for various architectures (base MLP, +PE, +Norm, +Hada) empirically validates that each augmentation sequentially lowers NTK variance, flattens the spectrum, and produces faster, more stable, and higher-quality interpolation of continuous functions (Ou et al., 17 Dec 2025).
6. Empirical Implications and Outlook
Experiments confirm the predicted variance reductions and indicate that each architectural intervention incrementally improves NTK conditioning and convergence properties. A plausible implication is that further architectural innovations or regularization strategies could be systematically evaluated through their effect on the four-factor variance decomposition. NTK-aware interpolation thus supplies both a diagnostic and a design framework for future advances in implicit neural representations (Ou et al., 17 Dec 2025).