Interpolating Estimators in Statistics
- Interpolating estimators are statistical methods that exactly match observed data by achieving zero empirical risk through techniques like polynomial interpolation, kernel methods, and deep learning.
- They highlight practical trade-offs, demonstrating how overparameterization, noise, and the curse of system size can impact generalization and statistical efficiency.
- Recent advances using methods such as kernel ridgeless regression and deep ReLU networks show that interpolation can be effective under structured conditions, mitigating classical high-dimensional challenges.
Interpolating estimators are statistical estimators that, for a given set of observed data, return values (often predictions or fitted values) that exactly match or “go through” the observed targets at their respective points. In modern statistics and machine learning, the term “interpolating” is often used more broadly: an estimator is interpolating if it achieves zero empirical risk (training error), regardless of the underlying hypothesis class complexity or the stochastic nature of the observations. These estimators have been at the center of recent theoretical and practical research, particularly in relation to overparameterized models, generalization, and the statistical limitations imposed by problem structure.
1. Definition and Formal Properties of Interpolating Estimators
An estimator is said to interpolate (“fit through”) a dataset if, for all , . This strict interpolation is traditionally associated with polynomial interpolation, spline fitting at knots, and certain kernel or nearest-neighbor approaches. In supervised learning, interpolating estimators achieve the minimum possible value of the empirical loss—typically zero for the standard squared or logistic loss, although in practice noise in or adversarial targets may render interpolation undesirable.
A precise characterization in the context of regression is: given a design matrix and response , an estimator returns such that . When and has full row rank, infinitely many interpolating exist; choices include the minimum-norm solution. In nonparametric settings, kernel ridgeless regression, KNN with , and deep neural networks trained with no regularization often interpolate.
2. Statistical Efficiency and the Curse of System Size
Interpolating estimators are strongly affected by the available sample size relative to complexity, as well as the problem structure. In the context of high-dimensional reweighting schemes, such as importance sampling or path-integral Monte Carlo, the variance of the canonical reweighted estimator,
is controlled by the fluctuations of the log-weight . When is an extensive sum over weakly correlated subsystems, the variance of scales as
where is the local variance. This implies that for any fixed variance at the subsystem level, overall statistical error increases exponentially with —a direct manifestation of the “curse of system size.” Interpolating estimators built via such schemes thus become exponentially statistically inefficient as dimensionality increases, making them unsuitable for large or complex systems (Ceriotti et al., 2011).
3. Sample Complexity, Structure, and the Curse of Simple Size
A central question is the sample complexity required for an interpolating estimator to achieve nontrivial performance. In unstructured settings—for example, estimating the size of an unknown set from i.i.d. samples—unbiased interpolating estimators require samples on the order of to achieve fixed accuracy, as in the birthday problem. More generally, in the absence of structure, all information is in the event of data collisions, limiting statistical power.
However, additional structure can greatly reduce this sample complexity. For partially ordered sets, convex bodies, or target function classes of low complexity (e.g., with low VC dimension), the complexity parameters that control estimator error (e.g., cascade-differences in cascading exclusion frameworks) decay much more quickly than the of the antichain case. For instance, in volume estimation of a -dimensional convex set, only samples suffice for accurate estimation, rather than the required by naive interpolation without structure (Chatterjee et al., 7 Aug 2025).
Table: Sample Complexity for Interpolating Estimators in Key Regimes
| Problem Structure | Required Sample Size | Main Complexity Control |
|---|---|---|
| Unknown finite set (antichain) | First collision, variance | |
| Totally ordered set | (any rate) | Max-statistics, $1/n$ error |
| Convex body, regression, VC class | Covering numbers, O() |
In all cases, structure enables O(1/n) or O(d/n) convergence for interpolation error; in its absence, the error is lower-bounded by the “curse of simple size” barrier (Chatterjee et al., 7 Aug 2025).
4. Overfitting, Generalization, and the Curse of Small Sample Size
Interpolating estimators are classically associated with overfitting, where empirical error is zero but generalization error is uncontrolled. Statistical learning theory bounds the gap between empirical and true risk as
In the small- regime, even simple models with high capacity may generalize poorly despite perfect interpolation. This was empirically demonstrated in COVID-19 forecasting: all models (linear regression, MLP, LSTM) could interpolate or nearly interpolate the short-term active-case time series, but beyond a horizon of days, their test collapsed, sometimes becoming negative. Even with aggressive feature selection and regularization, generalization remained poor except for very short-term forecasts, illustrating the practical limits of interpolation in small-sample settings (Nakıp et al., 2020).
5. Interpolation in High-Dimensional and Nonparametric Estimation
In function approximation, interpolation is classically infeasible for smooth target functions in high dimensions due to the curse of dimensionality: sample points are required for grid-based interpolation in dimensions. Recent work leverages structural results such as the Kolmogorov Superposition Theorem (KST) to construct interpolating estimators—e.g., deep ReLU networks or spline-based systems—which achieve dimension-independent approximation rates for functions in the Kolmogorov-Lipschitz class, with only function evaluations and parameters. This breaks the classical curse, indicating that “interpolating” is not necessarily synonymous with inefficiency, provided sufficiently strong function structure and architecture are exploited (Lai et al., 2021).
The matrix cross approximation technique further enables interpolating fits at a pivotal set of points, rather than requiring all samples, preserving error rates and computational cost.
6. Alternatives, Pitfalls, and Statistical Safeguards
Interpolating estimators introduce statistical instability in the presence of noise, confounding, or system size. In importance sampling, the exponential scaling of variance with system size () renders interpolation-based reweighting impractical for large (Ceriotti et al., 2011). In model selection and inference, conditioning on perfect fit (e.g., the significance filter in hypothesis testing) leads to systematic overestimation (“winner's curse”) and severe undercoverage of confidence intervals, especially for low-powered designs. Bias quantifiably decreases with increasing power but persists unless corrected by explicit shrinkage or Bayesian adjustment; design for high power or Bayesian shrinkage estimators is recommended for unbiased effect estimation (Zwet et al., 2020).
Additionally, in machine learning, interpolating nonlinear models may display benign overfitting—zero training error but small test error—under specific data and noise conditions, but this requires alignment between model capacity, implicit regularization, and data structure.
7. Recent Developments and Open Questions
Progress in high-dimensional statistics, computational mathematics, and machine learning continues to refine the boundaries of when interpolation is harmless, beneficial, or fundamentally limited. Methods combining structural decomposition (e.g., KST), localized kernels, adaptive basis selection (cross approximation), and robust regularization (ridge, Lasso, Bayesian priors) offer practical interpolating estimators that can operate efficiently in regimes previously dominated by the curse of dimensionality or small sample size, provided appropriate complexity control.
However, the precise circumstances under which interpolation yields universally efficient, generalizable estimators—rather than overfit or statistically vacuous models—remain active topics of inquiry. For large, weakly structured systems or in the absence of such favorable properties, the severe statistical inefficiencies predicted by classical and modern theory remain operative.
Key references: (Ceriotti et al., 2011, Chatterjee et al., 7 Aug 2025, Nakıp et al., 2020, Lai et al., 2021, Zwet et al., 2020)