Two-Stage Upper Boundary on Expected Risk

Updated 30 December 2025

The paper presents a two-stage framework that uses spectral truncation to provide sharp upper bounds on expected risk in kernel regression and wide neural networks.
It connects phase transitions in sample complexity with modewise risk contributions, accurately capturing bias and variance in different high-dimensional regimes.
The approach offers practical insights into double descent phenomena by quantifying finite-sample performance and guiding model selection through explicit risk envelopes.

A two-stage upper boundary on expected risk refers to a principled framework for tightly bounding or approximating the out-of-sample (generalization) performance of kernel methods—including wide neural networks in the “kernel regime”—by leveraging the spectral decomposition of the kernel and the statistical structure of high-dimensional data. This paradigm connects the learning curve of overparameterized models to phase transitions in sample complexity, using precise asymptotics and “conservation-law” constraints for modewise risk contributions. The approach unites several key analytic results in kernel regression and infinite-width neural networks, revealing how sharp, phase-specific upper envelopes on risk are constructed via multi-stage spectral truncations.

1. Problem Setting and Preliminaries

The model is kernel ridge regression (KRR) or its neural network analog in the kernel regime. Let the training set be $\{x_i, y_i\}_{i=1}^n$ , with $x_i\in\mathbb{R}^d$ , $y_i\in\mathbb{R}$ , and let $K(x,x')$ be a positive-definite kernel. In the overparameterized, infinite-width, “kernel regime,” a wide neural network or KRR converges to a kernel interpolator minimizing the RKHS norm:

$\hat f = \arg\min_{f\in\mathcal{H}_K}\|f\|_{\mathcal{H}_K}\qquad\text{subject to }f(x_i)=y_i$

with possible ridge regularization $\lambda>0$ . The test (expected) risk is $R_{\text{test}} = \mathbb{E}_{x,y}\big[(y - \hat{f}(x))^2\big]$ . Understanding sharp upper bounds on this risk as a function of dimensionality, sample size, kernel structure, and the target function’s expansion is central.

2. Kernel Regime and Linearization

In the kernel/lazy regime (Woodworth et al., 2019), model training corresponds to gradient descent remaining close to initialization, yielding dynamics linearized in parameter space. The infinite-width limit fixes the Neural Tangent Kernel (NTK):

$K_{\text{NTK}}(x,x') = \left\langle \nabla_W f(x;W_0),\,\nabla_W f(x';W_0) \right\rangle$

As a result, the trained function is precisely an RKHS minimizer, with generalization error fully determined by the spectral projection of the target and kernel Gram matrix.

3. Two-Stage Upper Boundary: Phase-dependent Spectral Truncation

In high dimensions, the risk landscape is stratified by critical sample thresholds. In the polynomial sample regime, where $n\asymp d^k$ , the function space decomposes into a hierarchy of polynomial (or spherical harmonic) degrees, and KRR can only “lock in” features up to a finite critical degree per phase (Hu et al., 2022, Misiakiewicz, 2022, Dubova et al., 2023, Pandit et al., 2 Aug 2024). The expected risk thus admits an explicit two-stage upper boundary:

First stage: Only the first $K-1$ spectral modes are fully captured, yielding a residual bias

$\sum_{k<K} a_k^2$

where $a_k$ are the expansion coefficients.

Second stage: At the “transition” $n\asymp d^K$ , the $K$ -th mode is partially resolved, with variance and bias sharply characterized as functions of the dimension-normalized “sampling ratio” $\delta_K=n/N_K$ (where $N_K\sim d^K$ is the multiplicity of the $K$ -th basis).

Explicitly, the total test error in the two-stage regime is

$R_{\rm test} \approx \sum_{k< K}a_k^2 + \frac{a_K^2}{(1+\mu_K\,\delta_K\,R_+)^{2}} + \sum_{k>K}a_k^2 + \sigma^2 (1+\mu_K\,\delta_K\,R_+)^2 R_+$

where $\mu_K = \mu/p_K$ (regularization normalized by the $K$ -th eigenvalue), $R_+$ is a phase-dependent quantity set by a spectral quadratic equation, and $\sigma^2$ is label noise variance. This gives a “two-phase” upper boundary, with only partial learning of features in the critical band and zero contribution from those above (Hu et al., 2022).

4. Conservation Law and Modewise Allocations

The Eigenlearning framework (Simon et al., 2021) refines these insights via a trace conservation law for kernel regression:

$\sum_i L_i = n$

where $L_i = \lambda_i/(\lambda_i+\kappa)$ is the learnability of kernel eigenmode $i$ ( $\lambda_i$ eigenvalues), and $\kappa$ solves the load-balanced self-consistency constraint

$n = \sum_{i} \frac{\lambda_i}{\lambda_i + \kappa} + \frac{\delta}{\kappa}$

The risk then splits modewise into bias (not captured) and variance (noise amplified by each $L_i$ ). In high-dimensional phases, spectral mass saturates the allowable $n$ -dimensional budget up to the cut-off, enforcing a two-stage structure: full representation below the cut, strictly upper-bounded partial representation at the transition, and truncation above.

5. Sharp Asymptotic Upper Bounds and Double/Multiple Descent

The two-stage upper boundary framework explains non-monotonic “double descent” and multiple descent in risk curves, as each transition point $n\asymp d^K$ opens up a new spectral mode to learning. At each critical sample size:

Before the transition ( $n\ll d^K$ ), the risk is strictly upper-bounded by the sum of biases and noise of lower-degree modes only.
At the transition ( $n\sim d^K$ ), both bias and variance for the $K$ th mode move from $a_K^2$ (unlearned) to zero (fully learned) as expressed by the two-phase asymptotic envelope, sharply bounding the test error above and below the transition.
After the transition ( $n\gg d^K$ ), the upper boundary shifts: all modes up to $K$ are learned, and the risk includes only higher-order tails and residual noise.

These results are nonasymptotic for finite-degree polynomial kernels (where the risk plateaus), but in the case of analytic kernels, the two-stage (“multi-stage”) upper boundary recurses for all $K$ (Hu et al., 2022, Misiakiewicz, 2022).

6. Implications, Applications, and Limitations

This framework generalizes beyond spheres/hypercubes to all kernels with polynomial (Gegenbauer/Hermite) spectral decompositions and is robust to label noise/regularization (Hu et al., 2022, Misiakiewicz, 2022, Dubova et al., 2023, Pandit et al., 2 Aug 2024). It yields precise finite-sample and large-sample risk upper bounds, predicts interpolation thresholds, and allows explicit risk-envelope prediction for model selection and capacity control.

However, the two-stage boundary is sharp only in the kernel regime. In the “rich” (feature learning) regime of deep networks with significant parameter movement, the true risk can improve below this boundary, exhibiting strong data-adaptive bias beyond RKHS constraints (Woodworth et al., 2019, Woodworth et al., 2020).

7. Summary Table: Upper Boundary Phases

Regime	$n$ Sample Range	Modes Fully Learned	Critical Transition	Test Risk Upper Boundary
Phase- $K$	$d^{K-1}\ll n \ll d^K$	$\le K-1$	$n \asymp d^K$	$\sum_{k<K} a_k^2$
At transition	$n \sim d^K$	$\le K-1$ + partial $K$	$K$	$\sum_{k<K} a_k^2 + \frac{a_K^2}{(1+\mu_K\delta_K R_+)^2} + \sigma^2\cdot\ldots$
After transition	$n \gg d^K$	$\le K$	--	$\sum_{k\le K} a_k^2 + \sum_{k>K} a_k^2 + \sigma^2/\mu_K$

The two-stage upper boundary on expected risk is thus determined by the spectral decomposition, phase-dependent learnability, and mode truncation structure inherent to kernel regime regression in high dimensions (Woodworth et al., 2019, Hu et al., 2022, Misiakiewicz, 2022, Dubova et al., 2023, Simon et al., 2021, Pandit et al., 2 Aug 2024).