Papers
Topics
Authors
Recent
Search
2000 character limit reached

Minimum-Norm Interpolating Estimator

Updated 10 February 2026
  • Minimum-norm interpolating estimator is a method that chooses the exact data fit with the smallest norm, ensuring minimal complexity and controlled smoothness.
  • It reveals surprising generalization properties such as benign overfitting and double-descent risk curves in overparameterized settings like linear regression and RKHS.
  • The estimator bridges implicit bias and explicit regularization, providing insights into risk minimization and model stability across diverse functional spaces.

A minimum-norm interpolating estimator is a solution to an interpolation problem where, among all possible fits that exactly match observed data, one selects the candidate with minimal norm according to a specified geometry. These estimators arise across a range of functional and statistical settings, including classical overparameterized linear regression, kernel methods in reproducing kernel Hilbert spaces (RKHS), Banach spaces, and variational function extension problems. Rigorous analysis of such estimators reveals both their surprising generalization properties—such as benign overfitting and double/multiple-descent risk curves—and their structural role as the unique representers of implicit regularization in overparameterized regimes.

1. Formal Definition and General Framework

Given observations (xi,yi)i=1n(x_i, y_i)_{i=1}^n, a function space F\mathcal{F}, and a norm \|\cdot\|, the minimum-norm interpolator f^\hat f is defined as: f^=arg minfFfsubject tof(xi)=yi,  i=1,,n.\hat f = \operatorname*{arg\,min}_{f \in \mathcal{F}} \|f\| \quad \text{subject to} \quad f(x_i) = y_i\,, \; i = 1,\ldots, n. In the finite-dimensional setting (e.g., the linear model Y=Xβ+ξY = X\beta^* + \xi with XRn×pX \in \mathbb{R}^{n \times p}, pnp \gg n), this becomes minimization of the Euclidean, 1\ell_1, or another norm over the solution set of Xβ=YX\beta = Y. In functional data settings—including Sobolev, RKHS, and Banach interpolants—corresponding norms enforce smoothness or structural simplicity (Chinot et al., 2020, Rangamani et al., 2020, Li, 2020, Herbert-Voss et al., 2014, Chandrasekaran et al., 2017).

Key explicit forms include:

  • Linear, 2\ell_2 case: β^=X+(Y)=X(XX)1Y\hat\beta = X^+(Y) = X^\top (XX^\top)^{-1} Y.
  • RKHS case: f^(x)=k(x,X)K1y\hat f(x) = k(x,X)K^{-1}y where Kij=k(xi,xj)K_{ij} = k(x_i,x_j). The function f^\hat f uniquely minimizes fH\|f\|_{\mathcal{H}} among all interpolants (Rangamani et al., 2020, Li, 2020).
  • Minimum weighted norm / Sobolev extension: Interpolants minimize a seminorm associated with derivative control or spectral weights, often yielding smooth, stable extensions (Herbert-Voss et al., 2014, Chandrasekaran et al., 2017).

2. Theoretical Properties: Bias–Variance, Generalization, and Risk Bounds

Linear Regression

In high-dimensional regression, the 2\ell_2 minimum-norm interpolator achieves, with high probability,

Σ1/2(β^β)22β22rcn(Σ)ξ22n\|\Sigma^{1/2}(\hat\beta - \beta^*)\|_2^2 \leq \frac{\|\beta^*\|_2^2 r_{cn}(\Sigma) \vee \|\xi\|_2^2}{n}

where rk(Σ)=ikλi(Σ)r_k(\Sigma) = \sum_{i\geq k} \lambda_i(\Sigma) is the sum of trailing eigenvalues ("effective dimension") (Chinot et al., 2020, Chinot et al., 2020, Lecué et al., 2022). This decomposition reflects a phase transition:

  • High signal-to-noise: The "bias term" dominates, often decaying rapidly with the spectrum.
  • Low signal-to-noise: The "variance term" dominates, saturating at ξ22/n\|\xi\|_2^2 / n; overfitting noise is "benign" and the prediction error is comparable to the irreducible noise floor.

Analogous bounds hold for different norms and problem structures with respective dependencies, e.g., logarithmic (for 1\ell_1 under sparsity) or group/block sizes (group Lasso) (Chinot et al., 2020, Wang et al., 2021, Li et al., 2021).

RKHS and Nonparametric Regression

For kernel interpolation, the minimum-norm interpolator minimizes RKHS norm, enjoys optimality properties for leave-one-out stability, and delivers generalization rates via stability risk conversions: $\E_S [ I[f^*] - \inf_{f \in \mathcal{H}} I[f] ] \leq \beta_{CV}$ where βCV\beta_{CV} is minimized for the minimum-norm solution and is controlled by the condition number of the kernel matrix (Rangamani et al., 2020, Li, 2020, Liang et al., 2021). Associated risk curves in high-dimension can exhibit "double-descent" or even multiple-descent due to phase transitions in random matrix spectra (1908.10292, Rangamani et al., 2020).

Factor Models and Model Structure

If XX and YY are generated by a low-rank or factor structure, explicit risk decompositions show that the minimum-norm interpolator can achieve excess risk near the oracle benchmark, provided the effective rank of ΣX\Sigma_X is less than nn and the signal loading is strong. In contrast, in high effective-rank ("junk features") regime, the interpolator's risk approaches the null predictor (Bunea et al., 2020, Mahdaviyeh et al., 2019).

3. Geometry, Universality, and Self-Induced Regularization

A robust geometric interpretation separates signal and noise directions in feature space:

  • The estimator decomposes into a ridge (regularized) estimator in the leading eigenspaces and an overfitting component on the residual subspace (Lecué et al., 2022).
  • "Self-induced regularization" arises because the solution must interpolate in-sample noise in a high-dimensional, low-spectral-density subspace; the effective degrees of freedom and estimation error are governed by the spectral decay of the covariance matrix.
  • The phenomena and bounds proved are universal across Gaussian and heavy-tailed designs (requiring only logn\log n moments) owing to high-dimensional concentration results and generalizations of the Dvoretsky–Milman theorem.

Benign overfitting: Provided the spectrum is appropriately "spiked" or decays rapidly, the overfitting component's contribution vanishes asymptotically ("benign overfitting") (Lecué et al., 2022, Mahdaviyeh et al., 2019, Chinot et al., 2020).

4. Extensions, Regularization, and Implicit Bias

Explicit and Implicit Regularization

  • Explicit regularization: Adding vanishing 2\ell_2 penalties to empirical risk minimization enforces convergence to minimum-norm interpolants, as rigorously shown for wide two-layer ReLU neural networks (Park et al., 2023). Exact scaling results dictate the required vanishing rate of weight decay.
  • Implicit regularization: Even in the absence of any explicit penalty, gradient descent and variants (SGD, momentum) initialized appropriately frequently converge to the minimum-norm or minimum-Barron-seminorm interpolant in function space (Park et al., 2023, Li, 2020). This phenomenon is observed both theoretically (via Γ\Gamma-convergence) and empirically.

Algorithmic Implications and Batch Partitioning

Naïve minimum-norm interpolation in linear regression can suffer from singularities and double-descent near p/n=1p/n = 1 (interpolation threshold). Batch-based correction (as in the batch minimum-norm estimator) regularizes this behavior, eliminates the double-descent, and introduces stable risk curves that are monotonic in the overparameterization ratio (Ioushua et al., 2023).

5. Consistency, Limitations, and Practical Considerations

Consistency and Optimality

  • For 2\ell_2 interpolation in low effective rank or factor models, asymptotic consistency is achievable.
  • For 1\ell_1 penalized interpolation under sparsity and isotropic design, sharp matching upper and lower bounds of order σ2/log(d/n)\sigma^2 / \log(d/n) are obtained, implying that consistency requires vanishing noise faster than 1/log(d/n)1 / \log(d/n) as d/nd/n \to \infty (Wang et al., 2021).

Uniform Convergence

Classic uniform convergence over norm balls does not explain the consistency of minimum-norm interpolation in the overparameterized regime. However, uniform convergence over the set of zero error predictors with bounded norm suffices and explains the observed generalization behavior (Zhou et al., 2020).

Non-Optimality and Alternatives

Although the minimum-norm interpolator is optimal in specific senses (e.g., smallest norm among interpolants), it is generally suboptimal when population information is available. Alternative interpolators—optimized for population risk conditional on known or estimable model structure and noise—can provably outperform minimum-norm solutions, especially in pathological spectral regimes (Oravkin et al., 2021).

6. Summary Table: Key Instances of Minimum-Norm Interpolants

Problem Setting Solution Definition/Formula Core Generalization Property
Linear least-squares (2\ell_2) β^=X+Y\hat\beta = X^+Y Benign overfitting if effective rank is low
RKHS/kernel methods f^(x)=k(x,X)K1y\hat f(x) = k(x,X) K^{-1} y Double/multiple descent, stability optimality
Sobolev/Banach function extension Minimize C1,1C^{1,1} or Sobolev seminorm under interpolation Explicit optimality with unique extension
Sparse/interpolating (1\ell_1) β^=arg minXβ=yβ1\hat\beta = \operatorname*{arg\,min}_{X\beta = y} \|\beta\|_1 Consistency only if noise vanishes as 1/log(d/n)1/\log(d/n)
Two-layer ReLU networks (Barron norm) f^=arg minf(xi)=yi[f]\hat f = \operatorname*{arg\,min}_{f(x_i) = y_i} [f] Implicit bias, function/parameter norm separation

7. Impact and Open Questions

The theory of minimum-norm interpolating estimators illuminates the role of high-dimensional geometry, implicit/explicit regularization, and spectral structure in modern statistical learning. This understanding underpins the phenomena of benign overfitting, stability risk minimization, and the empirical success of overparameterized models without explicit complexity control. Open questions remain in characterizing universal consistency for more general data and kernel classes, quantifying implicit bias in deeper and non-convex neural architectures, and formulating minimax-optimal population-aware interpolators in practical regimes (Chinot et al., 2020, Chinot et al., 2020, Lecué et al., 2022, Oravkin et al., 2021, Park et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Minimum-Norm Interpolating Estimator.