Papers
Topics
Authors
Recent
Search
2000 character limit reached

Interpolating Estimators in Statistics

Updated 23 January 2026
  • Interpolating estimators are statistical methods that exactly match observed data by achieving zero empirical risk through techniques like polynomial interpolation, kernel methods, and deep learning.
  • They highlight practical trade-offs, demonstrating how overparameterization, noise, and the curse of system size can impact generalization and statistical efficiency.
  • Recent advances using methods such as kernel ridgeless regression and deep ReLU networks show that interpolation can be effective under structured conditions, mitigating classical high-dimensional challenges.

Interpolating estimators are statistical estimators that, for a given set of observed data, return values (often predictions or fitted values) that exactly match or “go through” the observed targets at their respective points. In modern statistics and machine learning, the term “interpolating” is often used more broadly: an estimator is interpolating if it achieves zero empirical risk (training error), regardless of the underlying hypothesis class complexity or the stochastic nature of the observations. These estimators have been at the center of recent theoretical and practical research, particularly in relation to overparameterized models, generalization, and the statistical limitations imposed by problem structure.

1. Definition and Formal Properties of Interpolating Estimators

An estimator f^\hat{f} is said to interpolate (“fit through”) a dataset {(xi,yi)}i=1n\{(x_i, y_i)\}_{i=1}^n if, for all ii, f^(xi)=yi\hat{f}(x_i) = y_i. This strict interpolation is traditionally associated with polynomial interpolation, spline fitting at knots, and certain kernel or nearest-neighbor approaches. In supervised learning, interpolating estimators achieve the minimum possible value of the empirical loss—typically zero for the standard squared or logistic loss, although in practice noise in yiy_i or adversarial targets may render interpolation undesirable.

A precise characterization in the context of regression is: given a design matrix XRn×dX \in \mathbb{R}^{n \times d} and response yRny \in \mathbb{R}^n, an estimator returns f^\hat{f} such that Xβ^=yX\hat{\beta} = y. When dnd \geq n and XX has full row rank, infinitely many interpolating β^\hat{\beta} exist; choices include the minimum-norm solution. In nonparametric settings, kernel ridgeless regression, KNN with k=1k=1, and deep neural networks trained with no regularization often interpolate.

2. Statistical Efficiency and the Curse of System Size

Interpolating estimators are strongly affected by the available sample size relative to complexity, as well as the problem structure. In the context of high-dimensional reweighting schemes, such as importance sampling or path-integral Monte Carlo, the variance of the canonical reweighted estimator,

I^=i=1Ma(xi)w(xi)i=1Mw(xi),w(x)=π(x)ρ(x),\hat{I} = \frac{\sum_{i=1}^M a(x_i) w(x_i)}{\sum_{i=1}^M w(x_i)}, \qquad w(x) = \frac{\pi(x)}{\rho(x)},

is controlled by the fluctuations of the log-weight h(x)=lnw(x)h(x) = -\ln w(x). When h(x)h(x) is an extensive sum over NN weakly correlated subsystems, the variance of I^\hat{I} scales as

Var(I^)1M(eNκ1),\mathrm{Var}(\hat{I}) \approx \frac{1}{M} \left(e^{N\kappa} - 1\right),

where κ=Var(h1)\kappa = \mathrm{Var}(h_1) is the local variance. This implies that for any fixed variance at the subsystem level, overall statistical error increases exponentially with NN—a direct manifestation of the “curse of system size.” Interpolating estimators built via such schemes thus become exponentially statistically inefficient as dimensionality increases, making them unsuitable for large or complex systems (Ceriotti et al., 2011).

3. Sample Complexity, Structure, and the Curse of Simple Size

A central question is the sample complexity required for an interpolating estimator to achieve nontrivial performance. In unstructured settings—for example, estimating the size S|S| of an unknown set from i.i.d. samples—unbiased interpolating estimators require samples on the order of S1/2|S|^{1/2} to achieve fixed accuracy, as in the birthday problem. More generally, in the absence of structure, all information is in the event of data collisions, limiting statistical power.

However, additional structure can greatly reduce this sample complexity. For partially ordered sets, convex bodies, or target function classes of low complexity (e.g., with low VC dimension), the complexity parameters that control estimator error (e.g., cascade-differences δn,δn\delta'_n, \delta''_n in cascading exclusion frameworks) decay much more quickly than the O(1/S)O(1/|S|) of the antichain case. For instance, in volume estimation of a dd-dimensional convex set, only O(d)O(d) samples suffice for accurate estimation, rather than the O(exp(d))O(\exp(d)) required by naive interpolation without structure (Chatterjee et al., 7 Aug 2025).

Table: Sample Complexity for Interpolating Estimators in Key Regimes

Problem Structure Required Sample Size nn Main Complexity Control
Unknown finite set (antichain) nS1/2n \sim |S|^{1/2} First collision, variance
Totally ordered set nn \to \infty (any rate) Max-statistics, $1/n$ error
Convex body, regression, VC class ndn \gtrsim d Covering numbers, O(d/nd/n)

In all cases, structure enables O(1/n) or O(d/n) convergence for interpolation error; in its absence, the error is lower-bounded by the “curse of simple size” barrier (Chatterjee et al., 7 Aug 2025).

4. Overfitting, Generalization, and the Curse of Small Sample Size

Interpolating estimators are classically associated with overfitting, where empirical error is zero but generalization error is uncontrolled. Statistical learning theory bounds the gap between empirical and true risk as

Egen(f)Eemp(f)O(Complexity(F)n).|E_{\text{gen}}(f) - E_{\text{emp}}(f)| \leq O\left(\sqrt{\frac{\text{Complexity}(\mathcal{F})}{n}}\right).

In the small-nn regime, even simple models with high capacity may generalize poorly despite perfect interpolation. This was empirically demonstrated in COVID-19 forecasting: all models (linear regression, MLP, LSTM) could interpolate or nearly interpolate the short-term active-case time series, but beyond a horizon of K=3K=3 days, their test r2r^2 collapsed, sometimes becoming negative. Even with aggressive feature selection and regularization, generalization remained poor except for very short-term forecasts, illustrating the practical limits of interpolation in small-sample settings (Nakıp et al., 2020).

5. Interpolation in High-Dimensional and Nonparametric Estimation

In function approximation, interpolation is classically infeasible for smooth target functions in high dimensions due to the curse of dimensionality: (nd)(n^d) sample points are required for grid-based interpolation in dd dimensions. Recent work leverages structural results such as the Kolmogorov Superposition Theorem (KST) to construct interpolating estimators—e.g., deep ReLU networks or spline-based systems—which achieve dimension-independent approximation rates O(1/n)O(1/n) for functions in the Kolmogorov-Lipschitz class, with only O(nd)O(nd) function evaluations and parameters. This breaks the classical curse, indicating that “interpolating” is not necessarily synonymous with inefficiency, provided sufficiently strong function structure and architecture are exploited (Lai et al., 2021).

The matrix cross approximation technique further enables interpolating fits at a pivotal set of O(nd)O(nd) points, rather than requiring all O(nd)O(n^d) samples, preserving error rates and computational cost.

6. Alternatives, Pitfalls, and Statistical Safeguards

Interpolating estimators introduce statistical instability in the presence of noise, confounding, or system size. In importance sampling, the exponential scaling of variance with system size (exp[Nκ]\exp[N\kappa]) renders interpolation-based reweighting impractical for large NN (Ceriotti et al., 2011). In model selection and inference, conditioning on perfect fit (e.g., the significance filter in hypothesis testing) leads to systematic overestimation (“winner's curse”) and severe undercoverage of confidence intervals, especially for low-powered designs. Bias quantifiably decreases with increasing power but persists unless corrected by explicit shrinkage or Bayesian adjustment; design for high power or Bayesian shrinkage estimators is recommended for unbiased effect estimation (Zwet et al., 2020).

Additionally, in machine learning, interpolating nonlinear models may display benign overfitting—zero training error but small test error—under specific data and noise conditions, but this requires alignment between model capacity, implicit regularization, and data structure.

7. Recent Developments and Open Questions

Progress in high-dimensional statistics, computational mathematics, and machine learning continues to refine the boundaries of when interpolation is harmless, beneficial, or fundamentally limited. Methods combining structural decomposition (e.g., KST), localized kernels, adaptive basis selection (cross approximation), and robust regularization (ridge, Lasso, Bayesian priors) offer practical interpolating estimators that can operate efficiently in regimes previously dominated by the curse of dimensionality or small sample size, provided appropriate complexity control.

However, the precise circumstances under which interpolation yields universally efficient, generalizable estimators—rather than overfit or statistically vacuous models—remain active topics of inquiry. For large, weakly structured systems or in the absence of such favorable properties, the severe statistical inefficiencies predicted by classical and modern theory remain operative.

Key references: (Ceriotti et al., 2011, Chatterjee et al., 7 Aug 2025, Nakıp et al., 2020, Lai et al., 2021, Zwet et al., 2020)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Interpolating Estimators.