Norm-Minimizing Interpolators

Updated 4 February 2026

Norm-minimizing interpolators are estimators that choose the exact data fit with the smallest norm, balancing bias and variance via implicit regularization.
They reveal key phenomena such as double-descent and benign overfitting in overparameterized linear, kernel, and nonlinear models.
Extensions to ℓ₁, RKHS, and Sobolev norms enable handling sparse, structured, and functional data, enhancing stability and performance.

A norm-minimizing interpolator is an estimator or algorithmic solution that, among all possible interpolants (functions or parameters that fit the data exactly), selects the one minimizing a given norm (e.g., ℓ₂, ℓ₁, or a more general functional). Norm-minimization is fundamental to the modern theory of interpolation in high-dimensional statistics and machine learning, as it provides both geometric and statistical regularization—often implicit, sometimes explicit—governing the generalization and robustness of interpolating solutions in both linear and nonlinear models.

1. Definition and Mathematical Foundation

Given data $(x_1, y_1), \ldots, (x_n, y_n)$ , a function class $\mathcal{F}$ , and a norm $\|\cdot\|$ on $\mathcal{F}$ , a norm-minimizing interpolator is any solution to

$\hat{f}_* = \arg\min_{f \in \mathcal{F}} \|f\| \quad \text{such that } f(x_i) = y_i \text{ for all } i.$

In parametric regression (e.g., high-dimensional linear or kernel regression), this becomes

$\hat{\beta}_* = \arg\min_{\beta} \|\beta\| \quad \text{such that } X\beta = y$

for a norm $\|\cdot\|$ in parameter space.

Several instantiations are prominent:

Minimum ℓ₂-norm interpolator ("ridgeless regression"): the unique minimum-norm solution interpolating linear equations (Wu et al., 2023, Han et al., 2023).
Minimum ℓ₁-norm interpolator ("basis pursuit"): the minimum ℓ₁ solution interpolating given data, important in sparse regression and compressed sensing (Li et al., 2021).
Minimum weighted-norm or RKHS-norm interpolators: interpolants minimizing the norm in a reproducing kernel Hilbert space (RKHS) or with general quadratic forms (Li, 2020).

This notion extends naturally to function spaces (e.g., Sobolev, Barron, or native spaces) for continuous domains, splines, and kernel-based methods (Estévez, 2012, Hayotov, 2014, Chandrasekaran et al., 2017, Chu et al., 11 Jul 2025).

2. Statistical and Algorithmic Properties

Double-Descent and Generalization

Norm-minimizing interpolators display rich—and in high dimensions, sometimes counterintuitive—statistical behavior, particularly concerning the so-called "double-descent" phenomenon:

For minimum-ℓ₂ interpolators in linear regression with Gaussian features, the prediction risk diverges at the interpolation threshold ( $d/n=1$ ), but then decreases for $d>n$ , yielding "benign overfitting" under certain covariance conditions (Wu et al., 2023, Han et al., 2023, Li, 2020, Chinot et al., 2020, Koehler et al., 2021, Stojnic, 2024).
For minimum-ℓ₁ interpolators in sparse regression, risk exhibits multiple (triple or more) ascent/descent phases as overparameterization increases ("multiple descent"), reflecting the combinatorial-geometric structure of the cross-polytope constraint (Li et al., 2021).

Bagging (ensemble averaging of bootstrapped or Bernoulli-sketched interpolators) regularizes minimum-norm interpolators, eliminating the variance blow-up at the interpolation threshold and inducing the effect of explicit ℓ₂-regularization, with a closed-form mapping between the ensemble downsampling rate and the effective ridge penalty (Wu et al., 2023).

Bias-Variance Decomposition

For minimum-norm interpolators, the out-of-sample risk splits into bias (signal fidelity) and variance (noise amplification), each with explicit asymptotics in n, d, and the geometry of the feature covariance: $R_{\text{min-norm}} = \underbrace{\text{Bias}^2}_{\text{signal}} + \underbrace{\text{Variance}}_{\text{noise fit}}$ Where, for isotropic regression,

$R(\hat{\beta}_*) = \begin{cases} \displaystyle \sigma^2\,\frac{\phi}{1-\phi}, & \phi = d/n < 1 \ \displaystyle r^2\,\frac{\phi-1}{\phi} + \frac{\sigma^2}{\phi-1}, & \phi>1 \end{cases}$

with divergence at $\phi=1$ (Wu et al., 2023).

In overparameterized settings, norm-minimizing interpolators achieve nontrivial consistency as long as the effective norm growth is controlled; in particular, if the data covariance's trace and effective rank are favorable (i.e., fast-enough eigenvalue decay), the risk converges to the noise level even while interpolating noise (Koehler et al., 2021, Zhou et al., 2020, Chinot et al., 2020, Han et al., 2023, Wang et al., 2024).

Uniform Convergence and Implicit Regularization

Classical uniform convergence over large hypothesis classes often fails to explain the generalization of norm-minimizing interpolators due to the exponential size of the zero-training-error set. However, uniform convergence over norm-bounded interpolators that exactly fit the data suffices, with generalization gaps scaling like $O(\mathrm{Gaussian\ width}^2/n)$ (Koehler et al., 2021).

Norm-minimizing interpolation induces an "implicit regularization": the solution chosen by the algorithm (e.g., gradient descent or kernel ridge) among all interpolants, has minimal possible norm, conferring bias toward simple (low-complexity) functions, equivalent to regularization in the zero-regularization limit (Wu et al., 2023, Han et al., 2023, Vaswani et al., 2020).

3. Extensions for General Norms and Domains

General Norms and Structured Problems

The minimum-norm interpolation framework generalizes to arbitrary norms:

ℓ₁, group Lasso, nuclear norm: for structured estimation (sparse, group-sparse, low-rank), these interpolators achieve robust error rates, matching the adversarial-noise minimax lower bounds up to constants, provided sufficient overparameterization (Chinot et al., 2020).
Weighted norms and kernels: interpolation in weighted polynomial or orthonormal bases converges (as basis size increases) to the unique RKHS interpolant, with the norm specifying the RKHS (Li, 2020).

Function Spaces and Spline/Basis Construction

For spatial or functional data:

Sobolev or hybrid seminorms: Minimum-Sobolev-norm interpolation yields optimal (sometimes closed-form) splines that minimize high-order smoothness seminorms subject to interpolation or derivative constraints, including on manifolds or irregular domains (Chandrasekaran et al., 2017, Hayotov, 2014, Estévez, 2012).
Surface reconstruction: Norm-minimizing RBF or mixed-dimensional kernel interpolators in shape processing are constructed as solutions to quadratic minimization subject to pointwise constraints, with natural or Sobolev-space norm choices governing surface and normal estimation (Chu et al., 11 Jul 2025).

4. Minimum-Norm Interpolation in Learning Algorithms

Kernel Regression and RKHS

In kernel methods, for points $x_i$ and a reproducing kernel $K$ , the minimum RKHS-norm interpolator is

$\hat{f}(x) = \sum_{i=1}^n \alpha_i K(x, x_i),$

with coefficients solving $K \alpha = y$ (Liang et al., 2021, Li, 2020, Estévez, 2012). The norm-minimizing solution enjoys exact characterizations of generalization error in terms of RKHS norm and sample size, generalizing classical margin bounds.

Neural Networks: Barron and Other Implicit Norms

For shallow ReLU networks, the minimal-norm interpolator in the Barron space (i.e., minimum Barron-norm) is selected by weight-decay-type regularization, even in the limit of infinite width with vanishing regularization. Gradient-based algorithms, even without explicit regularization, exhibit a strong implicit bias towards minimum Barron-norm interpolants, as empirically confirmed across dimensions and architectures (Park et al., 2023).

5. Theoretical Limits, Stability, and Practical Implications

Stability and Robustness

Norm-minimizing interpolators exhibit remarkable robustness to adversarial or stochastic noise in sufficiently overparameterized regimes, provided the data distribution and norm geometry are properly aligned. O(average noise level) prediction error is attainable for ℓ₁, group-Lasso, nuclear, or ℓ₂ norms under appropriate design and overparameterization conditions (Chinot et al., 2020).

However, the amplified norm in "ill-conditioned" directions (e.g., for fast eigenvalue decay, small / vanishing eigenvalues in the covariance) can cause rapid norm blow-up for near-zero training error interpolators, and vacation of classical data-independent generalization bounds (Wang et al., 2024).

Design Considerations

Ensembling (Bagging): Aggregating bootstrapped/sketched interpolators stabilizes and regularizes the risk profile, equivalently producing a ridge-regularized effect with a calculable penalty (Wu et al., 2023).
Model selection and cross-validation: Cross-validation over the ridge parameter (or the bagging rate) yields near-oracle risk minimization simultaneously for estimation, prediction, and inference tasks in high-dimensional models (Han et al., 2023).
Optimization strategies: Each interpolating solution corresponds to the minimizer of a particular (typically data-adaptive) norm; careful projection or preconditioning can efficiently steer optimization towards better-generalizing norm-minimizing interpolators (Vaswani et al., 2020).

6. Geometry, Explicit Constructions, and Open Questions

Classical Geometry and Linear Function Spaces

In finite-dimensional function spaces (e.g., interpolation on the $n$ -cube or Euclidean ball), the norm-minimizing interpolator (interpolation projector) with minimal operator norm is achieved by maximizing geometric symmetries—e.g., inscribed regular simplexes for affine interpolation—yielding optimal operator norm bounds (e.g., $\sqrt{n+1}$ for the cube when $n+1$ is a Hadamard number, and between $\sqrt{n}$ and $\sqrt{n+1}$ on the ball) (Nevskii, 2022, Nevskii, 2023).

Open Problems

Key unresolved questions include:

Tight characterization of norm-minimizing interpolators beyond ℓ₂ and ℓ₁ for structured models, non-Gaussian designs, and general signal structures.
Extension of explicit optimal-norm constructions for splines and interpolation operators in higher dimensions, and their statistical properties (Estévez, 2012, Hayotov, 2014).
Generic understanding of benign overfitting/generalization for minimal-norm interpolators in deep nonlinear models; sharpening phase diagrams for beneficial/malignant covariate shifts (Mallinar et al., 2024).

References to Principal Results

Topic	Paper (arXiv)	Key Concepts/Results
Ensemble linear interpolators, bagging	(Wu et al., 2023)	Variance stabilization, implicit ridge, closed-form risk, bagging
Minimum-ℓ₁-norm interpolator, multiple descent	(Li et al., 2021)	Triple descent, sparse phase transitions, ℓ₁ vs. ℓ₂ regimes
Uniform convergence and norm-minimization	(Zhou et al., 2020)	Positive/negative uniform convergence, robust generalization
Near-interpolators, norm explosion	(Wang et al., 2024)	Rapid norm-growth, tradeoff, breakdown of standard norm-based bounds
Robustness for arbitrary norms	(Chinot et al., 2020)	Minimax error, Rademacher complexity, ℓ₁, group/nuclear norm
Cube/ball minimal norm projection	(Nevskii, 2022, Nevskii, 2023)	Maximal-volume simplex, Hadamard/constrained geometry, norm bounds
Optimizer-induced norms, generalization	(Vaswani et al., 2020)	Every interpolator ↔ some norm; optimization algorithms, biasing
Ridgeless interpolators, generalized risk	(Han et al., 2023)	Distributional characterization, oracle tuning, CV cross-task
Uniform convergence, Gaussian width	(Koehler et al., 2021)	Benign overfitting, basis pursuit, ℓ₁/minimal-norm sharp generaliz.
Sobolev and spline MSN interpolation	(Estévez, 2012, Hayotov, 2014, Chandrasekaran et al., 2017)	Explicit splines/quasi-optimal interpolators/minimal Sobolev norm
Minimum-norm kernel interpolators, mistake bounds	(Liang et al., 2021)	RKHS mistake bounds, on-line regret, fast rates, comparison margin
Surface reconstruction via norm-minimization	(Chu et al., 11 Jul 2025)	Mixed RBF/KAN trial spaces, uniqueness, normals from kernel gradient
Minimum norm in neural networks	(Park et al., 2023)	Minimum Barron norm, implicit bias, convergence in ReLU nets
Covariate shift and interpolation taxonomy	(Mallinar et al., 2024)	Risk bounds, beneficial/malignant shift, overparameterization regimes
Weighted-norm kernel interpolation	(Li, 2020)	RKHS limits, convergence, generalization in spherical/trig. basis
Ridge interpolator + correlation	(Stojnic, 2024)	RDT exact risk, row/col correlation, double descent, generalization

Conclusion

Norm-minimizing interpolators represent a foundational tool in high-dimensional statistics, machine learning, and approximation theory. Their statistical behavior, regularization properties, algorithmic biases, and geometric constructions are central to understanding both the practical performance and theoretical phenomena that arise in modern interpolating and overparameterized models (Wu et al., 2023, Li et al., 2021, Chinot et al., 2020, Han et al., 2023, Koehler et al., 2021, Mallinar et al., 2024, Wang et al., 2024, Park et al., 2023).