Papers
Topics
Authors
Recent
2000 character limit reached

Two-Stage Upper Boundary on Expected Risk

Updated 30 December 2025
  • The paper presents a two-stage framework that uses spectral truncation to provide sharp upper bounds on expected risk in kernel regression and wide neural networks.
  • It connects phase transitions in sample complexity with modewise risk contributions, accurately capturing bias and variance in different high-dimensional regimes.
  • The approach offers practical insights into double descent phenomena by quantifying finite-sample performance and guiding model selection through explicit risk envelopes.

A two-stage upper boundary on expected risk refers to a principled framework for tightly bounding or approximating the out-of-sample (generalization) performance of kernel methods—including wide neural networks in the “kernel regime”—by leveraging the spectral decomposition of the kernel and the statistical structure of high-dimensional data. This paradigm connects the learning curve of overparameterized models to phase transitions in sample complexity, using precise asymptotics and “conservation-law” constraints for modewise risk contributions. The approach unites several key analytic results in kernel regression and infinite-width neural networks, revealing how sharp, phase-specific upper envelopes on risk are constructed via multi-stage spectral truncations.

1. Problem Setting and Preliminaries

The model is kernel ridge regression (KRR) or its neural network analog in the kernel regime. Let the training set be {xi,yi}i=1n\{x_i, y_i\}_{i=1}^n, with xiRdx_i\in\mathbb{R}^d, yiRy_i\in\mathbb{R}, and let K(x,x)K(x,x') be a positive-definite kernel. In the overparameterized, infinite-width, “kernel regime,” a wide neural network or KRR converges to a kernel interpolator minimizing the RKHS norm:

f^=argminfHKfHKsubject to f(xi)=yi\hat f = \arg\min_{f\in\mathcal{H}_K}\|f\|_{\mathcal{H}_K}\qquad\text{subject to }f(x_i)=y_i

with possible ridge regularization λ>0\lambda>0. The test (expected) risk is Rtest=Ex,y[(yf^(x))2]R_{\text{test}} = \mathbb{E}_{x,y}\big[(y - \hat{f}(x))^2\big]. Understanding sharp upper bounds on this risk as a function of dimensionality, sample size, kernel structure, and the target function’s expansion is central.

2. Kernel Regime and Linearization

In the kernel/lazy regime (Woodworth et al., 2019), model training corresponds to gradient descent remaining close to initialization, yielding dynamics linearized in parameter space. The infinite-width limit fixes the Neural Tangent Kernel (NTK):

KNTK(x,x)=Wf(x;W0),Wf(x;W0)K_{\text{NTK}}(x,x') = \left\langle \nabla_W f(x;W_0),\,\nabla_W f(x';W_0) \right\rangle

As a result, the trained function is precisely an RKHS minimizer, with generalization error fully determined by the spectral projection of the target and kernel Gram matrix.

3. Two-Stage Upper Boundary: Phase-dependent Spectral Truncation

In high dimensions, the risk landscape is stratified by critical sample thresholds. In the polynomial sample regime, where ndkn\asymp d^k, the function space decomposes into a hierarchy of polynomial (or spherical harmonic) degrees, and KRR can only “lock in” features up to a finite critical degree per phase (Hu et al., 2022, Misiakiewicz, 2022, Dubova et al., 2023, Pandit et al., 2 Aug 2024). The expected risk thus admits an explicit two-stage upper boundary:

  1. First stage: Only the first K1K-1 spectral modes are fully captured, yielding a residual bias

k<Kak2\sum_{k<K} a_k^2

where aka_k are the expansion coefficients.

  1. Second stage: At the “transition” ndKn\asymp d^K, the KK-th mode is partially resolved, with variance and bias sharply characterized as functions of the dimension-normalized “sampling ratio” δK=n/NK\delta_K=n/N_K (where NKdKN_K\sim d^K is the multiplicity of the KK-th basis).

Explicitly, the total test error in the two-stage regime is

Rtestk<Kak2+aK2(1+μKδKR+)2+k>Kak2+σ2(1+μKδKR+)2R+R_{\rm test} \approx \sum_{k< K}a_k^2 + \frac{a_K^2}{(1+\mu_K\,\delta_K\,R_+)^{2}} + \sum_{k>K}a_k^2 + \sigma^2 (1+\mu_K\,\delta_K\,R_+)^2 R_+

where μK=μ/pK\mu_K = \mu/p_K (regularization normalized by the KK-th eigenvalue), R+R_+ is a phase-dependent quantity set by a spectral quadratic equation, and σ2\sigma^2 is label noise variance. This gives a “two-phase” upper boundary, with only partial learning of features in the critical band and zero contribution from those above (Hu et al., 2022).

4. Conservation Law and Modewise Allocations

The Eigenlearning framework (Simon et al., 2021) refines these insights via a trace conservation law for kernel regression:

iLi=n\sum_i L_i = n

where Li=λi/(λi+κ)L_i = \lambda_i/(\lambda_i+\kappa) is the learnability of kernel eigenmode ii (λi\lambda_i eigenvalues), and κ\kappa solves the load-balanced self-consistency constraint

n=iλiλi+κ+δκn = \sum_{i} \frac{\lambda_i}{\lambda_i + \kappa} + \frac{\delta}{\kappa}

The risk then splits modewise into bias (not captured) and variance (noise amplified by each LiL_i). In high-dimensional phases, spectral mass saturates the allowable nn-dimensional budget up to the cut-off, enforcing a two-stage structure: full representation below the cut, strictly upper-bounded partial representation at the transition, and truncation above.

5. Sharp Asymptotic Upper Bounds and Double/Multiple Descent

The two-stage upper boundary framework explains non-monotonic “double descent” and multiple descent in risk curves, as each transition point ndKn\asymp d^K opens up a new spectral mode to learning. At each critical sample size:

  • Before the transition (ndKn\ll d^K), the risk is strictly upper-bounded by the sum of biases and noise of lower-degree modes only.
  • At the transition (ndKn\sim d^K), both bias and variance for the KKth mode move from aK2a_K^2 (unlearned) to zero (fully learned) as expressed by the two-phase asymptotic envelope, sharply bounding the test error above and below the transition.
  • After the transition (ndKn\gg d^K), the upper boundary shifts: all modes up to KK are learned, and the risk includes only higher-order tails and residual noise.

These results are nonasymptotic for finite-degree polynomial kernels (where the risk plateaus), but in the case of analytic kernels, the two-stage (“multi-stage”) upper boundary recurses for all KK (Hu et al., 2022, Misiakiewicz, 2022).

6. Implications, Applications, and Limitations

This framework generalizes beyond spheres/hypercubes to all kernels with polynomial (Gegenbauer/Hermite) spectral decompositions and is robust to label noise/regularization (Hu et al., 2022, Misiakiewicz, 2022, Dubova et al., 2023, Pandit et al., 2 Aug 2024). It yields precise finite-sample and large-sample risk upper bounds, predicts interpolation thresholds, and allows explicit risk-envelope prediction for model selection and capacity control.

However, the two-stage boundary is sharp only in the kernel regime. In the “rich” (feature learning) regime of deep networks with significant parameter movement, the true risk can improve below this boundary, exhibiting strong data-adaptive bias beyond RKHS constraints (Woodworth et al., 2019, Woodworth et al., 2020).

7. Summary Table: Upper Boundary Phases

Regime nn Sample Range Modes Fully Learned Critical Transition Test Risk Upper Boundary
Phase-KK dK1ndKd^{K-1}\ll n \ll d^K K1\le K-1 ndKn \asymp d^K k<Kak2\sum_{k<K} a_k^2
At transition ndKn \sim d^K K1\le K-1 + partial KK KK k<Kak2+aK2(1+μKδKR+)2+σ2\sum_{k<K} a_k^2 + \frac{a_K^2}{(1+\mu_K\delta_K R_+)^2} + \sigma^2\cdot\ldots
After transition ndKn \gg d^K K\le K -- kKak2+k>Kak2+σ2/μK\sum_{k\le K} a_k^2 + \sum_{k>K} a_k^2 + \sigma^2/\mu_K

The two-stage upper boundary on expected risk is thus determined by the spectral decomposition, phase-dependent learnability, and mode truncation structure inherent to kernel regime regression in high dimensions (Woodworth et al., 2019, Hu et al., 2022, Misiakiewicz, 2022, Dubova et al., 2023, Simon et al., 2021, Pandit et al., 2 Aug 2024).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Two-Stage Upper Boundary on Expected Risk.