Minimum-Norm Interpolation Overview

Updated 24 April 2026

Minimum-norm interpolation is a technique that selects, among all possible interpolants satisfying data constraints, the one with the smallest norm such as ℓ2, ℓ1, Sobolev, or RKHS norms.
It plays a critical role in high-dimensional statistics and machine learning by influencing generalization, sparsity, and stability in overparameterized models and regression problems.
Practical implementations leverage representer theorems and finite-dimensional reductions, employing convex optimization and efficient algorithms to address challenges in both Hilbert and Banach space settings.

Minimum-norm interpolation is a foundational concept in both computational mathematics and statistical learning theory: given data constraints, it selects among all possible interpolants the one minimizing a prescribed norm, often $\ell_2$ , $\ell_1$ , a Sobolev or RKHS norm, or a more general Banach-space norm. It underlies a vast range of problems, including regression in high-dimensional statistics, kernel methods in machine learning, underdetermined system parameterization, surface reconstruction, and functional analysis. The minimum-norm criterion introduces a strong implicit bias in overparameterized regimes, deeply influencing generalization, sparsity, stability, and operator theoretic properties.

1. Formal Frameworks for Minimum-Norm Interpolation

The abstract minimum-norm interpolation problem is as follows: let $\mathcal B$ be a Banach space of functions on a domain $\Omega$ , equipped with a norm $\|\cdot\|_\mathcal{B}$ , and let $L_1,\dots,L_N$ be bounded linear functionals (typically point evaluations: $L_j(f) = f(x_j)$ ). Given target data $y_j$ , the minimum-norm interpolant solves

$\min_{f\in\mathcal B}\;\; \|f\|_\mathcal{B} \quad\text{subject to}\quad L_j(f) = y_j\ \forall j.$

A general representer theorem holds: under mild assumptions, minima exist and any minimizer $f^\star$ belongs to the finite-dimensional span of the evaluation functionals, i.e., $\ell_1$ 0 is determined by $\ell_1$ 1 and $\ell_1$ 2 geometry (Wang et al., 2020).

Explicit Representer Constructions

Hilbert space/RKHS: $\ell_1$ 3, with $\ell_1$ 4 the reproducing kernel and $\ell_1$ 5 found by solving $\ell_1$ 6.
Banach spaces: similar constructions hold in reproducing kernel Banach spaces (RKBS), but the nonlinear geometry leads to nonlinear systems for the optimal coefficients. Duality maps, semi-inner products, and (for $\ell_1$ 7 or $\ell_1$ 8 norms) nonlinear homotopy or KKT approaches are required (Groenewald et al., 2024, Wang et al., 2020).

Finite-Dimensional Reductions

In discrete settings or for structured problems (e.g., $\ell_1$ 9), the minimum-norm interpolation reduces to small-scale convex programs or LPs, even when the original space is infinite-dimensional (Wang et al., 2020, Bandeira et al., 2013).

2. Model Classes and Norm Choices

The practical and theoretical properties of minimum-norm interpolation are highly sensitive to the ambient geometry:

$\mathcal B$ 0 (Euclidean and Hilbert): The minimum $\mathcal B$ 1-norm interpolant has closed-form in linear regression (pseudoinverse solution) and admits sharp generalization and stability analyses. When the system is underdetermined, solutions are typically non-sparse and exhibit strong benign overfitting only when the feature space dimension substantially exceeds sample size (Kur et al., 30 Mar 2026, Li, 2020, Wang et al., 2021).
$\mathcal B$ 2 (Basis pursuit): Promotes sparsity in the interpolant. The minimum $\mathcal B$ 3-norm interpolator arises in high-dimensional sparse regression and compressed sensing, with sharp risk analyses and phases of “multiple descent” in out-of-sample error as the overparameterization ratio grows (Wang et al., 2021, Li et al., 2021, Bandeira et al., 2013).
Sobolev/Banach and RKHS norms: Kernel interpolation, surface fitting, and high-order smooth approximations rely on minimization in Sobolev or reproducing kernel Hilbert (or Banach) space norms; the implicit bias favors spatial or frequency smoothness, and the geometry controls error and stability (Chandrasekaran et al., 2017, Li, 2020, Groenewald et al., 2024, Chu et al., 11 Jul 2025, Yang, 29 Apr 2025).
$\mathcal B$ 4 and $\mathcal B$ 5 norms: For interpolating geometric data with curvature minimization criteria, $\mathcal B$ 6 and $\mathcal B$ 7 norms for $\mathcal B$ 8 lead to unique, regular spline or network solutions; as $\mathcal B$ 9 the solution becomes piecewise quadratic and is characterized by a dual nonlinear system (Vlachkova, 2019, Vlachkova, 2022).

3. Theoretical Guarantees: Error Bounds, Consistency, and Implicit Bias

High-Dimensional Linear Regimes

For isotropic Gaussian design $\Omega$ $Ω$ 0, with $\Omega$ $Ω$ 1 and observations $\Omega$ $Ω$ 2:
- $\Omega$ 3-norm interpolator: Prediction error scales as $\Omega$ 4; vanishing error requires $\Omega$ 5 (Wang et al., 2021).
- Minimum $\Omega$ 6-norm interpolator: If $\Omega$ 7 is $\Omega$ 8-sparse, error is $\Omega$ 9 under $\|\cdot\|_\mathcal{B}$ 0; consistency holds as $\|\cdot\|_\mathcal{B}$ 1 even at logarithmic rates (Wang et al., 2021). In the moderately sparse regime, the excess risk exhibits multiple descent and ascent regions as a function of overparameterization due to geometric phase transitions in the $\|\cdot\|_\mathcal{B}$ 2 ball (Li et al., 2021).

RKHS and Kernel Spaces

Convergence: In the overparameterized limit, interpolants minimizing a weighted norm converge (in $\|\cdot\|_\mathcal{B}$ 3) to the unique RKHS interpolant, pinned down by the data and kernel (Li, 2020).
Generalization: Deterministic and probabilistic bounds show error decay rates in terms of sample mesh-norm and kernel smoothness (Sobolev, spherical harmonics, NTK), but consistency in stronger Sobolev/Banach norms requires the true function to be sufficiently regular (Li, 2020, Chandrasekaran et al., 2017).
Inconsistency in high smoothness: For bounded kernels on low-dimensional domains, there exist sharp lower bounds (depending on kernel eigenvalue decay $\|\cdot\|_\mathcal{B}$ 4 and embedding exponent $\|\cdot\|_\mathcal{B}$ 5) above which minimum-norm interpolation is statistically inconsistent; this creates a Sobolev-norm inconsistency threshold $\|\cdot\|_\mathcal{B}$ 6 (Yang, 29 Apr 2025).

Neural Networks and Implicit Bias

Shallow ReLU networks: With explicit weight decay $\|\cdot\|_\mathcal{B}$ 7 and width $\|\cdot\|_\mathcal{B}$ 8 at an appropriate scaling, empirical risk minimizers converge to the minimum Barron-norm interpolant (Park et al., 2023).
Implicit bias: Gradient descent tends to select solutions close to the minimum-norm interpolant under many initializations and architectures; however, the degree to which the true parameter norm is minimized can depend subtly on the optimizer, initialization scale, and explicit regularization (Park et al., 2023).
Deep ReLU nets: For $\|\cdot\|_\mathcal{B}$ 9-norm minimum-norm interpolants in deep homogeneous ReLU nets, generalization and algorithmic stability occur when the network contains a low-rank “bottleneck” layer, with the low-rank bias arising from implicit regularization by gradient flow or weight decay (Harzli et al., 14 Feb 2026).

Geometric and Operator-Theoretic Properties

Minimum-norm projectors: For affine interpolation on the Euclidean ball or convex bodies, the minimal projector norm is realized by interpolation at regular simplexes, with sharp constants (often $L_1,\dots,L_N$ 0 for the ball) and is determined by geometric relations and Legendre polynomials (Nevskii, 2024, Nevskii, 2023).
Minimum-norm in Banach spaces: Uniform convexity (e.g., 2-uniform convexity) is necessary for sharp control of the structural bias and resulting generalization in overparameterized models, allowing precise non-Euclidean generalizations of the classical benign overfitting phenomenon (Kur et al., 30 Mar 2026).

4. Computational Approaches and Algorithmic Realizations

Sparse Polynomial and Quadratic Interpolation

Minimum- $L_1,\dots,L_N$ 1-norm formulations enable recovery of sparse interpolants in underdetermined regimes using standard LP solvers. Guarantees can be established via tools from compressed sensing (e.g., restricted isometry properties), allowing recovery of polynomials or quadratic models from $L_1,\dots,L_N$ 2 points instead of $L_1,\dots,L_N$ 3 in the dense case (Bandeira et al., 2013).

Sobolev and $L_1,\dots,L_N$ 4 Minimum-Norm Updates

For derivative-free optimization and adaptive model fitting, updating quadratic models via minimum $L_1,\dots,L_N$ 5-norm difference (instead of the classical Frobenius norm) yields superior theoretical projection properties and empirical robustness, with all updates reducible to linear system solves via KKT conditions (Xie et al., 2023, Chandrasekaran et al., 2017).

Interpolation Curve Networks and Surface Reconstruction

Edge convex networks with minimum $L_1,\dots,L_N$ 6-norm of curvature have unique and regular solutions for $L_1,\dots,L_N$ 7, reducing to nonlinear equations in the parameters of the basis curves. The $L_1,\dots,L_N$ 8 case has a weaker uniqueness property and solutions become piecewise quadratic with explicit multiplicative characterization (Vlachkova, 2022, Vlachkova, 2019).
Modern advances in point cloud geometry exploit minimum-norm kernel interpolation with mixed-dimensional bases for improved normal and curvature estimation from surface samples, typically reducing to block linear KKT systems (Chu et al., 11 Jul 2025).

5. Practical and Statistical Implications

Benign overfitting: In many high-dimensional regimes, minimum-norm interpolation does not lead to overfitting; for Gaussian design and sufficiently high dimension/sparsity, prediction error vanishes (Wang et al., 2021, Zhou et al., 2020).
Failure of uniform convergence: Classical uniform convergence over norm balls cannot explain generalization for minimum-norm interpolators; uniform convergence over the set of interpolators (i.e., predictors with zero empirical error) suffices, illuminating why low-norm plus perfect fit is critical for learning (Zhou et al., 2020).
Transfer learning and covariate shift: For overparameterized interpolators, finite-sample theory gives precise instance-wise risk bounds under distribution shift, classifying regimes where the shift is beneficial or malignant according to the spectral change in tail directions and level of overparameterization (Mallinar et al., 2024).
Operator norm minimization: For optimal interpolation projectors, geometric functionals (e.g., Legendre polynomials, simplex-volume maximization) yield sharp lower and upper bounds for the minimal $L_1,\dots,L_N$ 9-operator norm, with sharp asymptotic order in high dimension (Nevskii, 2024, Nevskii, 2023).

6. Open Problems and Directions

Significant open questions remain regarding:

Tight asymptotic constants in the high-dimensional regime for minimum-norm interpolators, especially in the proportional ( $L_j(f) = f(x_j)$ 0) setting (Wang et al., 2021).
Extension beyond isotropic Gaussian design: sub-Gaussian, heavy-tailed, and dependent-feature regimes (Wang et al., 2021, Kur et al., 30 Mar 2026).
Robustness of minimum-norm interpolation under model misspecification or approximate sparsity (Wang et al., 2021).
Numerical algorithms for $L_j(f) = f(x_j)$ 1-minimum-norm network interpolation and Banach-space representer systems, especially in large-scale settings (Vlachkova, 2022, Wang et al., 2020).
Connections between implicit bias, algorithmic stability, and network geometry in deep learning—with further formalizations of how minimum-norm criteria emerge under gradient-based training (Harzli et al., 14 Feb 2026, Park et al., 2023).
The extent of the phenomenon in fixed-dimension and non-asymptotic regimes, especially concerning the sharpness of inconsistency results in kernel and Banach space settings (Yang, 29 Apr 2025, Kur et al., 30 Mar 2026).

7. Summary Table: Key Regimes and Results

Problem/Norm	Exact Error Rate	Sparsity/Overparam. Regime	Consistency Criteria	Principal Reference
$L_j(f) = f(x_j)$ 2 norm, isotropic Gaussian	$L_j(f) = f(x_j)$ 3	$L_j(f) = f(x_j)$ 4, no sparsity assumption	$L_j(f) = f(x_j)$ 5	(Wang et al., 2021)
$L_j(f) = f(x_j)$ 6 norm, isotropic Gaussian	$L_j(f) = f(x_j)$ 7	$L_j(f) = f(x_j)$ 8	$L_j(f) = f(x_j)$ 9, $y_j$ 0	(Wang et al., 2021)
RKHS norm, kernel	Error plateaus at $y_j$ 1 ( $y_j$ 2)	$y_j$ 3 large, mesh norm $y_j$ 4	Target in RKHS, dense sampling	(Li, 2020, Chandrasekaran et al., 2017)
RKHS norm, bounded kernel	Inconsistency in $y_j$ 5 for $y_j$ 6	Fixed $y_j$ 7, $y_j$ 8	$y_j$ 9	(Yang, 29 Apr 2025)
Deep ReLU (min $\min_{f\in\mathcal B}\;\; \\|f\\|_\mathcal{B} \quad\text{subject to}\quad L_j(f) = y_j\ \forall j.$ 0)	Stability holds under low-rank	Layer bottleneck, $\min_{f\in\mathcal B}\;\; \\|f\\|_\mathcal{B} \quad\text{subject to}\quad L_j(f) = y_j\ \forall j.$ 1	Stable subnetwork + low-rank	(Harzli et al., 14 Feb 2026)

References

(Wang et al., 2021) "Tight bounds for minimum $\min_{f\in\mathcal B}\;\; \|f\|_\mathcal{B} \quad\text{subject to}\quad L_j(f) = y_j\ \forall j.$ 2-norm interpolation of noisy data"
(Li et al., 2021) "Minimum $\min_{f\in\mathcal B}\;\; \|f\|_\mathcal{B} \quad\text{subject to}\quad L_j(f) = y_j\ \forall j.$ 3-norm interpolators: Precise asymptotics and multiple descent"
(Yang, 29 Apr 2025) "Sobolev norm inconsistency of kernel interpolation"
(Kur et al., 30 Mar 2026) "Minimum Norm Interpolation via The Local Theory of Banach Spaces: The Role of $\min_{f\in\mathcal B}\;\; \|f\|_\mathcal{B} \quad\text{subject to}\quad L_j(f) = y_j\ \forall j.$ 4-Uniform Convexity"
(Wang et al., 2020) "Representer Theorems in Banach Spaces: Minimum Norm Interpolation, Regularized Learning and Semi-Discrete Inverse Problems"
(Chu et al., 11 Jul 2025) "Minimum-norm interpolation for unknown surface reconstruction"
(Xie et al., 2023) "Least $\min_{f\in\mathcal B}\;\; \|f\|_\mathcal{B} \quad\text{subject to}\quad L_j(f) = y_j\ \forall j.$ 5 Norm Updating Quadratic Interpolation Model Function for Derivative-free Trust-region Algorithms"
(Chandrasekaran et al., 2017) "Minimum Sobolev norm interpolation of derivative data"
(Park et al., 2023) "Minimum norm interpolation by perceptra: Explicit regularization and implicit bias"
(Harzli et al., 14 Feb 2026) "Sufficient Conditions for Stability of Minimum-Norm Interpolating Deep ReLU Networks"
(Bandeira et al., 2013) "Computation of sparse low degree interpolating polynomials and their application to derivative-free optimization"
(Vlachkova, 2022) "Edge convex smooth interpolation curve networks with minimum $\min_{f\in\mathcal B}\;\; \|f\|_\mathcal{B} \quad\text{subject to}\quad L_j(f) = y_j\ \forall j.$ 6-norm of the second derivative"
(Vlachkova, 2019) "Interpolation of scattered data in $\min_{f\in\mathcal B}\;\; \|f\|_\mathcal{B} \quad\text{subject to}\quad L_j(f) = y_j\ \forall j.$ 7 using minimum $\min_{f\in\mathcal B}\;\; \|f\|_\mathcal{B} \quad\text{subject to}\quad L_j(f) = y_j\ \forall j.$ 8-norm networks, $\min_{f\in\mathcal B}\;\; \|f\|_\mathcal{B} \quad\text{subject to}\quad L_j(f) = y_j\ \forall j.$ 9"
(Li, 2020) "Generalization error of minimum weighted norm and kernel interpolation"
(Nevskii, 2024) "Optimal Lagrange Interpolation Projectors and Legendre Polynomials"
(Nevskii, 2023) "The Minimum Norm of a Projector under Linear Interpolation on a Euclidean Ball"
(Zhou et al., 2020) "On Uniform Convergence and Low-Norm Interpolation Learning"
(Groenewald et al., 2024) "Optimal interpolation in Hardy and Bergman spaces: a reproducing kernel Banach space approach"
(Mallinar et al., 2024) "Minimum-Norm Interpolation Under Covariate Shift"