Non-Stationary Isotropic Kernels
- Non-stationary isotropic kernels are covariance functions on Euclidean space that combine rotational invariance with location-dependent flexibility.
- They generalize stationary, dot product, and infinite-width neural network kernels by capturing even and odd geometric modes via radial functions and Gegenbauer polynomials.
- Their design supports practical applications in Gaussian process regression, spatial statistics, and deep learning while ensuring strict positive definiteness through careful function construction.
Non-stationary isotropic kernels are covariance functions defined on Euclidean space that are invariant under the action of the orthogonal group but whose dependence on location is not restricted to stationary differences (i.e., they are not translation invariant). Such kernels combine the geometric invariance of isotropy—covariance functions that depend only on the “shape” of pairs modulo rotations—with the modeling flexibility of non-stationarity, permitting spatially heterogeneous scaling, amplitude, and regularity. This framework encompasses and unifies classical stationary isotropic kernels, dot product kernels, and their non-stationary generalizations, and includes prominent machine learning objects such as infinite-width neural network kernels.
1. Mathematical Characterization of Non-Stationary Isotropic Kernels
Let be a continuous kernel satisfying for all . The principal result is that any such kernel admits an expansion of the form
where
- are continuous “radial” kernels, uniquely determined by , and must be such that for all ,
- is the degree- normalized Gegenbauer polynomial with index .
This structure generalizes Schoenberg’s classical result for inner-product–dependent kernels on the sphere (where the reduce to constants). When is stationary isotropic, the dependence on becomes a function of , while restriction to the unit sphere () recovers dot product kernels (Benning et al., 27 Jun 2025).
2. Strict Positive Definiteness
For as above to be strictly positive definite, two requirements must be met:
- The “base” kernel must be positive at the origin: .
- For any finite set of distinct norm values, and any nonzero , the quadratic forms are nonzero for infinitely many even and infinitely many odd .
This ensures that the kernel captures a sufficiently rich set of geometric “modes” (even and odd degree components) on any set of distinct radial shells. Specializing to dot product kernels (with ), one recovers the classical requirement that infinitely many even and infinitely many odd coefficients must be positive for strict positive definiteness (Benning et al., 27 Jun 2025).
3. Unification of Stationary, Dot Product, and Neural Network Kernels
The general expansion simultaneously captures:
- Stationary Isotropic Kernels: If , then simplifies to a function of , and the kernel depends on and only via their separation.
- Dot Product Kernels: When restricted to , the expansion becomes (Schoenberg’s sphere theorem).
- Infinite-Width Neural Network Kernels: For many architectures (e.g., NNGP, NTK), the covariance kernel between network outputs is of the form
where are recursively computable functions derived from the activation and initialization in the neural network. These kernels are non-stationary because they depend separately on and , but isotropic because the only geometric dependence is via the normalized inner product (Benning et al., 27 Jun 2025).
4. Connection to Spatial Statistics and Machine Learning
This unification brings fundamental insight into the design and analysis of kernels in several domains:
- Gaussian Process Regression and Kernel Methods: The expansion gives a systematic template for constructing kernels able to express locally adaptive variance and geometry, beyond the restrictions of translation invariance.
- Spatial Statistics: Non-stationary isotropic kernels are essential for modeling spatially heterogeneous media, for example where correlation length or amplitude varies with location, but where rotational symmetry is natural.
- Neural Architecture Analysis: The realization that wide neural networks induce non-stationary isotropic kernels links deep learning theory with kernel methods, enabling cross-fertilization via harmonic analysis and classical positive-definiteness criteria.
5. High-Dimensional and Infinite-Dimensional Regime
The characterization is stable as , since Gegenbauer polynomials asymptotically become monomials and the series expansion remains well-defined. This property is critical for high-dimensional function analysis and for understanding the geometry of function spaces associated with neural tangent kernels (NTK) and related GP priors (Benning et al., 27 Jun 2025).
6. Practical Guidance for Construction and Use
The expansion provides a concrete recipe:
- One designs or learns the functions to encode domain knowledge or fit data, for example adapting to spatial inhomogeneity, input-dependent scaling, or to capture effects of deep learning feature spaces.
- For strict positive definiteness, care must be taken to ensure that both even and odd degree sectors are nontrivial across all radial slices under consideration.
In summary, the Schoenberg-type characterization synthesizes stationary, dot product, and non-stationary isotropic kernel classes into a single positive-definite framework. It enables principled generalizations for spatial statistics, kernel learning, and high-dimensional inference, and precisely delineates the interplay between rotational invariance and local adaptability in non-stationary covariance modeling (Benning et al., 27 Jun 2025).