Interpolation Thresholds Across Domains

Updated 13 April 2026

Interpolation threshold is the critical value at which a model transitions from exact data interpolation to regimes of stable generalization and computational efficiency.
In sparse polynomial interpolation, the threshold often equals twice the sparsity level, scaling quadratically in high dimensions with deterministic and randomized sampling.
Adaptive strategies in neural networks, wavelet decoding, and topological data analysis use interpolation thresholds to balance model simplicity, performance, and computational cost.

An interpolation threshold is a critical value—typically in terms of sample size, regularization, or sparsity—beyond which a function, network, or algorithm transitions from exact interpolation of data (perfect fit) to another regime such as stable generalization, computational efficiency, or topological stationarity. Precise forms and consequences of the interpolation threshold depend on the context: sparse polynomial interpolation, neural networks and generalization, signal processing via wavelet thresholds, or topological data analysis. This article reviews foundational results across these settings, highlighting quantitative thresholds, their structural origins, and empirical behavior.

1. Sparse Polynomial Interpolation: Theoretical Thresholds and Scaling Laws

In sparse function approximation, given a basis $\{B_j\}_{j\in\Lambda}$ (of cardinality $N$ ) over a domain $\Omega\subset\mathbb{R}^d$ , and $s$ -sparse functions $f(x) = \sum_{j\in T}c_j B_j(x)$ where $|T|\le s$ , the interpolation threshold refers to the minimal number $m$ of samples required for unisolvent recovery—that is, for the mapping $f \mapsto (f(x_1),\ldots,f(x_m))$ to be injective on the set of $s$ -sparse functions $U^s$ .

The principal result establishes a sharp interpolation threshold at $N$ 0 under broad conditions. If any $N$ 1 basis functions are strongly linearly independent on some subdomain, there exists a set of $N$ 2 points yielding unisolvency for any $N$ 3. When the basis has the Chebyshev-system property (every $N$ 4 functions form a Chebyshev system), $N$ 5 arbitrary points suffice (Xu et al., 2013).

However, in high-dimensional Chebyshev polynomial spaces (tensor-product or total-degree), a universal $N$ 6-point set is infeasible; the minimal deterministic sample size increases to $N$ 7 (with $N$ 8 a function of the polynomial degree and ambient dimension). By constructing deterministic points as $N$ 9, and selecting $\Omega\subset\mathbb{R}^d$ 0 appropriately, Xu and Zhou achieve robust $\Omega\subset\mathbb{R}^d$ 1-recovery guarantees (Xu et al., 2013). The threshold for unisolvent interpolation thus shows a quadratic dependence on sparsity, with exponential or polynomial scaling in dimension, depending on basis structure.

2. Neural Networks: Optimization/Interpolation Thresholds and Simplicity Bias

In overparametrized neural networks, the interpolation threshold marks a qualitative change in the landscape traversed by optimization algorithms. Specifically, in two-layer ReLU networks $\Omega\subset\mathbb{R}^d$ 2 trained by gradient flow on $\Omega\subset\mathbb{R}^d$ 3 samples in $\Omega\subset\mathbb{R}^d$ 4 dimensions, there exists a critical sample size $\Omega\subset\mathbb{R}^d$ 5 such that:

For $\Omega\subset\mathbb{R}^d$ 6 gradient flow typically drives $\Omega\subset\mathbb{R}^d$ 7, reaching a global minimum that exactly interpolates training data.
For $\Omega\subset\mathbb{R}^d$ 8 the same dynamics converge only to a spurious (simplicity-biased) local minimum, failing to interpolate ( $\Omega\subset\mathbb{R}^d$ 9) but yielding asymptotically optimal population risk.

This "optimization threshold" is a nontrivial function of task complexity. Its origin is traced to the early-alignment phase: small-initialization regimes cause neuron weights to cluster along extremal directions in parameter space, effectively reducing the capacity for perfect data fit without loss of generalization. After this point, solutions correspond to minimum-norm representations of the data and generalize optimally, with population MSE $s$ 0, outperforming full interpolators in the presence of label noise (Boursier et al., 2024). This mechanism is formalized through a Polyak–Łojasiewicz inequality and spectral concentration arguments.

3. Adaptive Thresholding in Wavelet-Based Frame Interpolation

In high-efficiency video frame interpolation, a dynamic interpolation threshold controls the computational budget by sparsifying wavelet-domain decoding. At each wavelet level $s$ 1, a coefficient-wise threshold $s$ 2 is computed, with the scalar threshold-ratio $s$ 3 learned via a classifier embedded in the network.

Valid masks are defined as $s$ 4, determining where high-frequency wavelet coefficients are worth reconstructing. The threshold classifier selects $s$ 5 for each sample using a differentiable Gumbel–Softmax sampling mechanism, jointly optimized with reconstruction and FLOP-regularization losses. This adaptively set threshold achieves up to $s$ 6 reduction in computation with negligible loss in image quality metrics (PSNR, SSIM) (Kong et al., 2023). Larger thresholds result in sparser computation masks (lower FLOPs), while smaller thresholds yield denser reconstruction and higher fidelity.

4. Topological Interpolation Thresholds via Persistent Homology

In topological data analysis, interpolation thresholds can be governed by convergence properties of persistence diagrams under iterative point insertion schemes (e.g., Voronoi or Sibson interpolation). After each iteration, persistent homology is computed for the evolving Delaunay complex; the topological fidelity of the interpolation is monitored by measuring bottleneck ( $s$ 7) and $s$ 8-Wasserstein ( $s$ 9) distances between consecutive persistence diagrams.

A vector-norm threshold $f(x) = \sum_{j\in T}c_j B_j(x)$ 0 (e.g., $f(x) = \sum_{j\in T}c_j B_j(x)$ 1 in filtration-scale units) is set, and when $f(x) = \sum_{j\in T}c_j B_j(x)$ 2, further interpolation is halted. This rule is statistically grounded via trimmed Wasserstein tests, ensuring robustness to outlier persistence points. A small enough $f(x) = \sum_{j\in T}c_j B_j(x)$ 3 bounds the maximum lifetime of spuriously introduced homology classes. Empirically, this topological stopping rule typically results in rapid stabilization of topological features after a few iterations, preventing overfitting of spurious noise (Melodia et al., 2019).

5. Deterministic vs Randomized Thresholds and Empirical Behavior

A contrast emerges between deterministic point constructions (sparse interpolation) and random sampling (compressed sensing). While random sampling can achieve sparser recovery with $f(x) = \sum_{j\in T}c_j B_j(x)$ 4, deterministic designs guarantee worst-case recovery at $f(x) = \sum_{j\in T}c_j B_j(x)$ 5, typically with slightly higher sample complexity but greater algorithmic predictability and analytic structure (Xu et al., 2013). Empirical investigations confirm near-equality in recovery rates for practical sample sizes.

Similarly, in adaptive frame interpolation, dynamically learned thresholds enable instance-optimal computation allocation, as opposed to static or hand-tuned fixed thresholds. Across domains, the interpolation threshold serves as a design control: balancing statistical recovery, computational load, and preservation of salient structure.

6. Summary Table: Interpolation Thresholds Across Domains

Context	Core Threshold Scaling	Empirical Behavior
Sparse Poly. Interpolation	$f(x) = \sum_{j\in T}c_j B_j(x)$ 6 (ideal), $f(x) = \sum_{j\in T}c_j B_j(x)$ 7 (high-dim)	Deterministic points match random for $f(x) = \sum_{j\in T}c_j B_j(x)$ 8 large
Two-layer ReLU NN	$f(x) = \sum_{j\in T}c_j B_j(x)$ 9	Beyond $\|T\|\le s$ 0, interpolation fails but generalization improves
Wavelet Frame Interpol.	Samplewise $\|T\|\le s$ 1 (dynamic)	18–40% compute reduction for fixed image quality
Topological Interp.	$\|T\|\le s$ 2	3–4 steps yield stable homological content

The notion of interpolation threshold thus unifies key trade-offs in sample efficiency, computational resources, and structural fidelity across the mathematical sciences. It governs the transition from exact interpolation—often accompanied by overfitting or inefficiency—to regimes of pragmatic stability and generalization.