Lipschitz bounds for integral kernels

Published 3 Apr 2026 in stat.ML and cs.LG | (2604.02887v1)

Abstract: Feature maps associated with positive definite kernels play a central role in kernel methods and learning theory, where regularity properties such as Lipschitz continuity are closely related to robustness and stability guarantees. Despite their importance, explicit characterizations of the Lipschitz constant of kernel feature maps are available only in a limited number of cases. In this paper, we study the Lipschitz regularity of feature maps associated with integral kernels under differentiability assumptions. We first provide sufficient conditions ensuring Lipschitz continuity and derive explicit formulas for the corresponding Lipschitz constants. We then identify a condition under which the feature map fails to be Lipschitz continuous and apply these results to several important classes of kernels. For infinite width two-layer neural network with isotropic Gaussian weight distributions, we show that the Lipschitz constant of the associated kernel can be expressed as the supremum of a two-dimensional integral, leading to an explicit characterization for the Gaussian kernel and the ReLU random neural network kernel. We also study continuous and shift-invariant kernels such as Gaussian, Laplace, and Matérn kernels, which admit an interpretation as neural network with cosine activation function. In this setting, we prove that the feature map is Lipschitz continuous if and only if the weight distribution has a finite second-order moment, and we then derive its Lipschitz constant. Finally, we raise an open question concerning the asymptotic behavior of the convergence of the Lipschitz constant in finite width neural networks. Numerical experiments are provided to support this behavior.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper presents explicit formulas for the Lipschitz constant of kernel feature maps, enabling precise robustness certification in kernel-based learning systems.
It details a rigorous theoretical framework that computes bounds for both infinite-width neural network kernels and shift-invariant kernels using closed-form expressions.
Numerical experiments confirm that empirical Lipschitz estimates converge to theoretical bounds, validating the robustness guarantees provided by the derived criteria.

Lipschitz Bounds for Integral Kernels: Theory and Implications

Introduction

The quantitative characterization of robustness in kernel-based learning systems is paramount for applications subject to adversarial perturbations or sensitive to operator stability. In this context, the global Lipschitz constant of a model—particularly of the kernel feature map—is directly tied to certified robustness and stability guarantees. The article "Lipschitz bounds for integral kernels" (2604.02887) provides a comprehensive theoretical framework for obtaining explicit, often sharp, Lipschitz constants for a broad class of integral kernels, including those associated with infinite-width random neural networks and shift-invariant kernels commonly used in practice.

Main Theoretical Results

The paper advances the state of the art by addressing the estimation and explicit computation of the Lipschitz constant for the feature map associated with positive definite kernels that admit a representation as an integral over random features. The core contribution is the establishment of precise conditions under which the feature map is Lipschitz continuous, accompanied by closed-form (or tractable) expressions for the optimal Lipschitz constant.

Lipschitz Regularity of Feature Maps

Let $k(x, x') = \int_\Omega \phi(\omega, x) \phi(\omega, x') dP(\omega)$ denote the integral kernel with associated feature map $\varphi: x \mapsto k(x, \cdot)$ into its RKHS. Under the assumption that $\phi(\omega, \cdot)$ is Fréchet differentiable and $Lip(\phi(\omega, \cdot))$ is $L^2(P)$ -integrable, the feature map’s Lipschitz constant $Lip(\varphi)$ can be exactly characterized by

$Lip(\varphi) = \sup_{x \in X, \ z \in S_E} \|D_x \phi(\cdot, x)[z]\|_{L^2(P)} = \sup_{x \in X, \ z \in S_E} \left(D_x D_y k(x, x)[z, z]\right)^{1/2}$

where $S_E$ is the unit sphere in the input Banach space.

This formula generalizes prior non-sharp bounds, leading to tighter and sometimes minimal certificates for model robustness. Furthermore, the article identifies a criterion under which no finite Lipschitz bound exists—namely, when the corresponding supremum diverges, as in the case of certain non-smooth kernels.

Classes of Kernels Analyzed

Infinite-Width Neural Network Kernels

For two-layer neural network kernels with random weights $(w, b)$ drawn from an isotropic Gaussian distribution, and $\phi(\omega, x) = \sigma(w^T x + b)$ for activation $\varphi: x \mapsto k(x, \cdot)$ 0, $\varphi: x \mapsto k(x, \cdot)$ 1 admits an explicit expression as a supremum over a two-dimensional integral involving the first derivative of $\varphi: x \mapsto k(x, \cdot)$ 2 and the Gaussian measure. In particular:

Random Fourier Features (RFF): For $\varphi: x \mapsto k(x, \cdot)$ 3 and phase uniform in $\varphi: x \mapsto k(x, \cdot)$ 4, corresponding to the Gaussian kernel, $\varphi: x \mapsto k(x, \cdot)$ 5 where $\varphi: x \mapsto k(x, \cdot)$ 6 is the Gaussian width parameter.
ReLU Kernel: For $\varphi: x \mapsto k(x, \cdot)$ 7, $\varphi: x \mapsto k(x, \cdot)$ 8, matching robust upper bounds for neural network kernels.

Figure 1: Random Fourier features of Gaussian kernel.

Shift-Invariant Kernels

For continuous, shift-invariant kernels ( $\varphi: x \mapsto k(x, \cdot)$ 9) with Fourier transform $\phi(\omega, \cdot)$ 0, the necessary and sufficient condition for the feature map’s Lipschitz continuity is that $\phi(\omega, \cdot)$ 1. The Lipschitz constant is

$\phi(\omega, \cdot)$ 2

This result applies to generalized Gaussian, Matérn, and Laplace kernels, where for Matérn kernels, $\phi(\omega, \cdot)$ 3 is finite if and only if the smoothness parameter $\phi(\omega, \cdot)$ 4; otherwise, the feature map is not Lipschitz continuous.

Numerical Investigation and Open Questions

A persistent open question addressed is the asymptotic behavior of the Lipschitz constant for empirical random feature maps $\phi(\omega, \cdot)$ 5 (with $\phi(\omega, \cdot)$ 6 features) as $\phi(\omega, \cdot)$ 7. The paper empirically investigates whether $\phi(\omega, \cdot)$ 8 converges to the analytically computable $\phi(\omega, \cdot)$ 9 derived for the infinite feature case.

Using Monte Carlo evaluation and quantile analysis for three representative kernels (Gaussian RFF, ReLU neural networks, Matérn RFF), the experimental results confirm convergence in probability of $Lip(\phi(\omega, \cdot))$ 0 to $Lip(\phi(\omega, \cdot))$ 1 with increasing $Lip(\phi(\omega, \cdot))$ 2, supporting the theory that these explicit formulas accurately govern not only the integral kernel but also its finite approximations in high dimensions.

Implications and Future Directions

The formal expressions for the Lipschitz constant derived in this article have both practical and theoretical significance. Practically, they provide rigorous certificates of robustness for kernel-based predictors—directly bounding the adversarial contraction or expansion exerted by the learned representation. Theoretically, the equivalence between moment finiteness (second-order for weight distributions) and Lipschitz continuity of the feature map identifies an intrinsic link between kernel smoothness and robustness properties.

These results further inform the design of robust, scalable kernel machines using random feature approximations, privileging kernel and random projection constructions with analytically tractable and minimal Lipschitz constants. The explicit dependence on covariance and Hessian spectral properties also opens avenues for spectral kernel parameterization.

Future research could look at sharp finite-sample concentration results for empirical Lipschitz constants, generalizations to deeper neural tangent kernels, and computational approaches to estimating $Lip(\phi(\omega, \cdot))$ 3 for non-integral or non-differentiable kernels.

Conclusion

This paper establishes a rigorous and complete theory for computing Lipschitz bounds for a broad family of integral kernels, with explicit formulas covering both neural network and shift-invariant kernels. The results enable precise robustness certification for random feature models and highlight the critical dependence on kernel spectral/curvature properties and the distributional moments of random weights. The findings pave the way for principled robustness analysis and informed architecture design in kernel machines and infinite-width neural networks, with ongoing research needed to extend these guarantees to even richer kernel and neural architectures.

Markdown Report Issue