Estimating the Spectral Moments of the Kernel Integral Operator from Finite Sample Matrices (2410.17998v3)
Abstract: Analyzing the structure of sampled features from an input data distribution is challenging when constrained by limited measurements in both the number of inputs and features. Traditional approaches often rely on the eigenvalue spectrum of the sample covariance matrix derived from finite measurement matrices; however, these spectra are sensitive to the size of the measurement matrix, leading to biased insights. In this paper, we introduce a novel algorithm that provides unbiased estimates of the spectral moments of the kernel integral operator in the limit of infinite inputs and features from finitely sampled measurement matrices. Our method, based on dynamic programming, is efficient and capable of estimating the moments of the operator spectrum. We demonstrate the accuracy of our estimator on radial basis function (RBF) kernels, highlighting its consistency with the theoretical spectra. Furthermore, we showcase the practical utility and robustness of our method in understanding the geometry of learned representations in neural networks.
- Bach, F. (2013). Sharp analysis of low-rank kernel matrix approximations. In Conference on learning theory, pages 185–209. PMLR.
- Bach, F. (2017). On the equivalence between kernel quadrature rules and random feature expansions. Journal of machine learning research, 18(21):1–38.
- Bach, F. (2022). Information theory with kernel methods. IEEE Transactions on Information Theory, 69(2):752–775.
- Sublinear time eigenvalue approximation via random sampling. Algorithmica, pages 1–66.
- Self-consistent dynamical field theory of kernel evolution in wide neural networks. Advances in Neural Information Processing Systems, 35:32240–32256.
- Signal and noise in correlation matrix. Physica A: Statistical Mechanics and its Applications, 343:295–310.
- Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks. Nature communications, 12(1):2914.
- A spectral theory of neural prediction and alignment. Advances in Neural Information Processing Systems, 36.
- Optimal rates for the regularized least-squares algorithm. Foundations of Computational Mathematics, 7:331–368.
- Kernel methods for deep learning. Advances in neural information processing systems, 22.
- Classification and geometry of general perceptual manifolds. Physical Review X, 8(3):031003.
- Separability and geometry of object manifolds in deep neural networks. Nature communications, 11(1):746.
- On the mathematical foundations of learning. Bulletin of the American mathematical society, 39(1):1–49.
- Harder, better, faster, stronger convergence rates for least-squares regression. Journal of Machine Learning Research, 18(101):1–51.
- Janson, S. (2018). Renewal theory for asymmetric U𝑈Uitalic_U-statistics. Electronic Journal of Probability, 23(none):1 – 27.
- Karoui, N. E. (2008). Spectrum estimation for large dimensional covariance matrices using random matrix theory. The Annals of Statistics, 36(6):2757 – 2790.
- Khorunzhiy, O. (2008). Estimates for moments of random matrices with gaussian elements. Séminaire de probabilités XLI, pages 51–92.
- Kingma, D. P. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Spectrum estimation from samples. The Annals of Statistics, 45(5).
- Theory of U-statistics, volume 273. Springer Science & Business Media.
- A well-conditioned estimator for large-dimensional covariance matrices. Journal of multivariate analysis, 88(2):365–411.
- Distribution of eigenvalues for some sets of random matrices. Matematicheskii Sbornik, 114(4):507–536.
- Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning. Journal of Machine Learning Research, 22(165):1–73.
- A mean field view of the landscape of two-layer neural networks. Proceedings of the National Academy of Sciences, 115(33):E7665–E7671.
- Acceleration through spectral density estimation. In International Conference on Machine Learning, pages 7553–7562. PMLR.
- Random feature attention. arXiv preprint arXiv:2103.02143.
- Random features for large-scale kernel machines. Advances in neural information processing systems, 20.
- Finding global minima via kernel approximations. Mathematical Programming, pages 1–82.
- Generalization properties of learning with random features. Advances in neural information processing systems, 30.
- Empirical analysis of the hessian of over-parametrized neural networks. arXiv preprint arXiv:1706.04454.
- Separation of scales and a thermodynamic description of feature learning in some cnns. Nature Communications, 14(1):908.
- Feature-learning networks are consistent across widths at realistic scales. Advances in Neural Information Processing Systems, 36.
- The effect of the input density distribution on kernel-based classifiers. In ICML ’00 Proceedings of the Seventeenth International Conference on Machine Learning, pages 1159–1166. Morgan Kaufmann Publishers Inc.
- Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv.
- A theory of representation learning gives a deep generalisation of kernel methods. In International Conference on Machine Learning, pages 39380–39415. PMLR.
- Tensor programs v: Tuning large neural networks via zero-shot hyperparameter transfer. arXiv preprint arXiv:2203.03466.
- Gaussian regression and optimal finite dimensional linear models. Technical report, Aston University, Birmingham.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.