Interpolating Estimators in Statistics

Updated 23 January 2026

Interpolating estimators are statistical methods that exactly match observed data by achieving zero empirical risk through techniques like polynomial interpolation, kernel methods, and deep learning.
They highlight practical trade-offs, demonstrating how overparameterization, noise, and the curse of system size can impact generalization and statistical efficiency.
Recent advances using methods such as kernel ridgeless regression and deep ReLU networks show that interpolation can be effective under structured conditions, mitigating classical high-dimensional challenges.

Interpolating estimators are statistical estimators that, for a given set of observed data, return values (often predictions or fitted values) that exactly match or “go through” the observed targets at their respective points. In modern statistics and machine learning, the term “interpolating” is often used more broadly: an estimator is interpolating if it achieves zero empirical risk (training error), regardless of the underlying hypothesis class complexity or the stochastic nature of the observations. These estimators have been at the center of recent theoretical and practical research, particularly in relation to overparameterized models, generalization, and the statistical limitations imposed by problem structure.

1. Definition and Formal Properties of Interpolating Estimators

An estimator $\hat{f}$ is said to interpolate (“fit through”) a dataset $\{(x_i, y_i)\}_{i=1}^n$ if, for all $i$ , $\hat{f}(x_i) = y_i$ . This strict interpolation is traditionally associated with polynomial interpolation, spline fitting at knots, and certain kernel or nearest-neighbor approaches. In supervised learning, interpolating estimators achieve the minimum possible value of the empirical loss—typically zero for the standard squared or logistic loss, although in practice noise in $y_i$ or adversarial targets may render interpolation undesirable.

A precise characterization in the context of regression is: given a design matrix $X \in \mathbb{R}^{n \times d}$ and response $y \in \mathbb{R}^n$ , an estimator returns $\hat{f}$ such that $X\hat{\beta} = y$ . When $d \geq n$ and $X$ has full row rank, infinitely many interpolating $\hat{\beta}$ exist; choices include the minimum-norm solution. In nonparametric settings, kernel ridgeless regression, KNN with $k=1$ , and deep neural networks trained with no regularization often interpolate.

2. Statistical Efficiency and the Curse of System Size

Interpolating estimators are strongly affected by the available sample size relative to complexity, as well as the problem structure. In the context of high-dimensional reweighting schemes, such as importance sampling or path-integral Monte Carlo, the variance of the canonical reweighted estimator,

$\hat{I} = \frac{\sum_{i=1}^M a(x_i) w(x_i)}{\sum_{i=1}^M w(x_i)}, \qquad w(x) = \frac{\pi(x)}{\rho(x)},$

is controlled by the fluctuations of the log-weight $h(x) = -\ln w(x)$ . When $h(x)$ is an extensive sum over $N$ weakly correlated subsystems, the variance of $\hat{I}$ scales as

$\mathrm{Var}(\hat{I}) \approx \frac{1}{M} \left(e^{N\kappa} - 1\right),$

where $\kappa = \mathrm{Var}(h_1)$ is the local variance. This implies that for any fixed variance at the subsystem level, overall statistical error increases exponentially with $N$ —a direct manifestation of the “curse of system size.” Interpolating estimators built via such schemes thus become exponentially statistically inefficient as dimensionality increases, making them unsuitable for large or complex systems (Ceriotti et al., 2011).

3. Sample Complexity, Structure, and the Curse of Simple Size

A central question is the sample complexity required for an interpolating estimator to achieve nontrivial performance. In unstructured settings—for example, estimating the size $|S|$ of an unknown set from i.i.d. samples—unbiased interpolating estimators require samples on the order of $|S|^{1/2}$ to achieve fixed accuracy, as in the birthday problem. More generally, in the absence of structure, all information is in the event of data collisions, limiting statistical power.

However, additional structure can greatly reduce this sample complexity. For partially ordered sets, convex bodies, or target function classes of low complexity (e.g., with low VC dimension), the complexity parameters that control estimator error (e.g., cascade-differences $\delta'_n, \delta''_n$ in cascading exclusion frameworks) decay much more quickly than the $O(1/|S|)$ of the antichain case. For instance, in volume estimation of a $d$ -dimensional convex set, only $O(d)$ samples suffice for accurate estimation, rather than the $O(\exp(d))$ required by naive interpolation without structure (Chatterjee et al., 7 Aug 2025).

Table: Sample Complexity for Interpolating Estimators in Key Regimes

Problem Structure	Required Sample Size $n$	Main Complexity Control
Unknown finite set (antichain)	$n \sim \|S\|^{1/2}$	First collision, variance
Totally ordered set	$n \to \infty$ (any rate)	Max-statistics, $1/n$ error
Convex body, regression, VC class	$n \gtrsim d$	Covering numbers, O( $d/n$ )

In all cases, structure enables O(1/n) or O(d/n) convergence for interpolation error; in its absence, the error is lower-bounded by the “curse of simple size” barrier (Chatterjee et al., 7 Aug 2025).

4. Overfitting, Generalization, and the Curse of Small Sample Size

Interpolating estimators are classically associated with overfitting, where empirical error is zero but generalization error is uncontrolled. Statistical learning theory bounds the gap between empirical and true risk as

$|E_{\text{gen}}(f) - E_{\text{emp}}(f)| \leq O\left(\sqrt{\frac{\text{Complexity}(\mathcal{F})}{n}}\right).$

In the small- $n$ regime, even simple models with high capacity may generalize poorly despite perfect interpolation. This was empirically demonstrated in COVID-19 forecasting: all models (linear regression, MLP, LSTM) could interpolate or nearly interpolate the short-term active-case time series, but beyond a horizon of $K=3$ days, their test $r^2$ collapsed, sometimes becoming negative. Even with aggressive feature selection and regularization, generalization remained poor except for very short-term forecasts, illustrating the practical limits of interpolation in small-sample settings (Nakıp et al., 2020).

5. Interpolation in High-Dimensional and Nonparametric Estimation

In function approximation, interpolation is classically infeasible for smooth target functions in high dimensions due to the curse of dimensionality: $(n^d)$ sample points are required for grid-based interpolation in $d$ dimensions. Recent work leverages structural results such as the Kolmogorov Superposition Theorem (KST) to construct interpolating estimators—e.g., deep ReLU networks or spline-based systems—which achieve dimension-independent approximation rates $O(1/n)$ for functions in the Kolmogorov-Lipschitz class, with only $O(nd)$ function evaluations and parameters. This breaks the classical curse, indicating that “interpolating” is not necessarily synonymous with inefficiency, provided sufficiently strong function structure and architecture are exploited (Lai et al., 2021).

The matrix cross approximation technique further enables interpolating fits at a pivotal set of $O(nd)$ points, rather than requiring all $O(n^d)$ samples, preserving error rates and computational cost.

6. Alternatives, Pitfalls, and Statistical Safeguards

Interpolating estimators introduce statistical instability in the presence of noise, confounding, or system size. In importance sampling, the exponential scaling of variance with system size ( $\exp[N\kappa]$ ) renders interpolation-based reweighting impractical for large $N$ (Ceriotti et al., 2011). In model selection and inference, conditioning on perfect fit (e.g., the significance filter in hypothesis testing) leads to systematic overestimation (“winner's curse”) and severe undercoverage of confidence intervals, especially for low-powered designs. Bias quantifiably decreases with increasing power but persists unless corrected by explicit shrinkage or Bayesian adjustment; design for high power or Bayesian shrinkage estimators is recommended for unbiased effect estimation (Zwet et al., 2020).

Additionally, in machine learning, interpolating nonlinear models may display benign overfitting—zero training error but small test error—under specific data and noise conditions, but this requires alignment between model capacity, implicit regularization, and data structure.

7. Recent Developments and Open Questions

Progress in high-dimensional statistics, computational mathematics, and machine learning continues to refine the boundaries of when interpolation is harmless, beneficial, or fundamentally limited. Methods combining structural decomposition (e.g., KST), localized kernels, adaptive basis selection (cross approximation), and robust regularization (ridge, Lasso, Bayesian priors) offer practical interpolating estimators that can operate efficiently in regimes previously dominated by the curse of dimensionality or small sample size, provided appropriate complexity control.

However, the precise circumstances under which interpolation yields universally efficient, generalizable estimators—rather than overfit or statistically vacuous models—remain active topics of inquiry. For large, weakly structured systems or in the absence of such favorable properties, the severe statistical inefficiencies predicted by classical and modern theory remain operative.

Key references: (Ceriotti et al., 2011, Chatterjee et al., 7 Aug 2025, Nakıp et al., 2020, Lai et al., 2021, Zwet et al., 2020)

Markdown Upgrade to Chat

References (5)

The inefficiency of re-weighted sampling and the curse of system size in high order path integration (2011)

Estimating the size of a set using cascading exclusion (2025)

Curse of Small Sample Size in Forecasting of the Active Cases in COVID-19 Outbreak (2020)

The Kolmogorov Superposition Theorem can Break the Curse of Dimensionality When Approximating High Dimensional Functions (2021)

The Significance Filter, the Winner's Curse and the Need to Shrink (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Interpolating Estimators.

Interpolating Estimators in Statistics

1. Definition and Formal Properties of Interpolating Estimators

2. Statistical Efficiency and the Curse of System Size

3. Sample Complexity, Structure, and the Curse of Simple Size

4. Overfitting, Generalization, and the Curse of Small Sample Size

5. Interpolation in High-Dimensional and Nonparametric Estimation

6. Alternatives, Pitfalls, and Statistical Safeguards

7. Recent Developments and Open Questions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Interpolating Estimators in Statistics

1. Definition and Formal Properties of Interpolating Estimators

2. Statistical Efficiency and the Curse of System Size

3. Sample Complexity, Structure, and the Curse of Simple Size

4. Overfitting, Generalization, and the Curse of Small Sample Size

5. Interpolation in High-Dimensional and Nonparametric Estimation

6. Alternatives, Pitfalls, and Statistical Safeguards

7. Recent Developments and Open Questions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research