Commutation of ridgeless and infinite-width limits in kernel ridge regression

Determine whether the ridgeless limit (taking the regularization parameter λ→0) can be interchanged with the infinite-dimensional feature-space limit (N→∞) in kernel ridge regression with Mercer kernels. Specify precise conditions under which these two limits commute and characterize any discrepancies in the resulting generalization error when they do not.

Background

In connecting linear regression results to kernel ridge regression, the paper considers a kernel with Mercer decomposition and a finite-dimensional feature expansion of size N. Under sufficiently fast spectral decay, taking N→∞ at fixed λ can be justified. However, the behavior in the ridgeless setting is subtler: many recent analyses focus on vanishing ridge (λ→0), and it is not immediate that sending N→∞ and λ→0 in different orders yields the same asymptotics.

Clarifying the interchangeability of these limits is important because a large body of work analyzes ridgeless kernel regression and scaling laws, often relying on large-N asymptotics. Establishing commuting conditions would sharpen theoretical guarantees and guide practical regimes where ridgeless training aligns with infinite-width approximations.

References

However, when λ → 0 it is not clear that one can interchange the ridgeless limit with the large N limit.

— Scaling and renormalization in high-dimensional regression (2405.00592 - Atanasov et al., 1 May 2024) in Section 5.2 Connection to Kernel Regression via Gaussian Universality

Commutation of ridgeless and infinite-width limits in kernel ridge regression

Background

References

Related Problems