Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Convolutional Deep Kernel Machines (2309.09814v3)

Published 18 Sep 2023 in stat.ML and cs.LG

Abstract: Standard infinite-width limits of neural networks sacrifice the ability for intermediate layers to learn representations from data. Recent work (A theory of representation learning gives a deep generalisation of kernel methods, Yang et al. 2023) modified the Neural Network Gaussian Process (NNGP) limit of Bayesian neural networks so that representation learning is retained. Furthermore, they found that applying this modified limit to a deep Gaussian process gives a practical learning algorithm which they dubbed the deep kernel machine (DKM). However, they only considered the simplest possible setting: regression in small, fully connected networks with e.g. 10 input features. Here, we introduce convolutional deep kernel machines. This required us to develop a novel inter-domain inducing point approximation, as well as introducing and experimentally assessing a number of techniques not previously seen in DKMs, including analogues to batch normalisation, different likelihoods, and different types of top-layer. The resulting model trains in roughly 77 GPU hours, achieving around 99% test accuracy on MNIST, 72% on CIFAR-100, and 92.7% on CIFAR-10, which is SOTA for kernel methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Semi-supervised classification with graph convolutional kernel machines. arXiv preprint arXiv:2301.13764, 2023.
  2. Kernel regression with infinite-width neural networks on millions of examples, 2023.
  3. Wide neural networks with bottlenecks are deep gaussian processes. J. Mach. Learn. Res., 21(1), jan 2020. ISSN 1532-4435.
  4. Laurence Aitchison. Why bigger is not always better: on finite and infinite neural networks. In ICML, 2020.
  5. Deep kernel processes. In ICML, 2021.
  6. Joseph M Antognini. Finite size corrections for neural network gaussian processes. In ICML Workshop on Theoretical Physics for Deep Learning, 2019.
  7. On exact computation with an infinitely wide neural net. Advances in Neural Information Processing Systems, 32, 2019.
  8. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013.
  9. Deep convolutional gaussian processes, 2018. URL https://arxiv.org/abs/1810.03052.
  10. A representer theorem for deep kernel learning. The Journal of Machine Learning Research, 20(1):2302–2333, 2019.
  11. Adversarial examples, uncertainty, and transfer testing robustness in gaussian process hybrid deep networks. arXiv preprint arXiv:1707.02476, 2017.
  12. Manifold gaussian processes for regression. In 2016 International Joint Conference on Neural Networks (IJCNN), pp.  3338–3345. IEEE, 2016.
  13. Recurrent kernel networks. Advances in Neural Information Processing Systems, 32, 2019.
  14. Kernel methods for deep learning. In NeurIPS, 2009.
  15. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.  248–255, 2009. doi: 10.1109/CVPR.2009.5206848.
  16. Bayesian image classification with deep convolutional gaussian processes. In Silvia Chiappa and Roberto Calandra (eds.), Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS), volume 108 of Proceedings of Machine Learning Research. PMLR, 2020.
  17. Deep neural networks as point estimates for deep gaussian processes. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  9443–9455. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/4e6cd95227cb0c280e99a195be5f6615-Paper.pdf.
  18. Asymptotics of wide networks from feynman diagrams. arXiv preprint arXiv:1909.11304, 2019.
  19. Deep convolutional networks as shallow gaussian processes. arXiv preprint arXiv:1808.05587, 2018.
  20. Neural networks and quantum field theory. Machine Learning: Science and Technology, 2(3):035002, 2021.
  21. Finite depth and width corrections to the neural tangent kernel. arXiv preprint arXiv:1909.05989, 2019.
  22. Deep Residual Learning for Image Recognition. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’16. IEEE, 2016.
  23. Variational fourier features for gaussian processes. J. Mach. Learn. Res., 18(1):5537–5588, 2017.
  24. Neural tangent kernel: Convergence and generalization in neural networks. In NeurIPS, pp.  8580–8589, 2018.
  25. Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto, Ontario, 2009.
  26. Inter-domain gaussian processes for sparse inference using inducing features. Advances in Neural Information Processing Systems, 22, 2009.
  27. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  28. Deep learning. nature, 521(7553):436–444, 2015.
  29. Deep neural networks as gaussian processes. arXiv preprint arXiv:1711.00165, 2017.
  30. Finite versus infinite neural networks: an empirical study. Advances in Neural Information Processing Systems, 33:15156–15172, 2020.
  31. Statistical mechanics of deep linear neural networks: The back-propagating renormalization group. arXiv preprint arXiv:2012.04030, 2020.
  32. Enhanced convolutional neural tangent kernels. arXiv preprint arXiv:1911.00809, 2019.
  33. David John Cameron MacKay. Introduction to gaussian processes. 1998. URL https://api.semanticscholar.org/CorpusID:116281095.
  34. Low-precision arithmetic for fast gaussian processes. In James Cussens and Kun Zhang (eds.), Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, volume 180 of Proceedings of Machine Learning Research, pp.  1306–1316. PMLR, 01–05 Aug 2022. URL https://proceedings.mlr.press/v180/maddox22a.html.
  35. Julien Mairal. End-to-end kernel learning with supervised convolutional kernel networks. Advances in neural information processing systems, 29, 2016.
  36. Convolutional kernel networks. Advances in neural information processing systems, 27, 2014.
  37. Gaussian process behaviour in wide deep neural networks. arXiv preprint arXiv:1804.11271, 2018.
  38. A self consistent theory of gaussian processes captures feature learning effects in finite cnns. arXiv preprint arXiv:2106.04110, 2021.
  39. Predicting the outputs of finite networks trained with noisy gradients. arXiv preprint arXiv:2004.01190, 2020.
  40. Radford M Neal. BAYESIAN LEARNING FOR NEURAL NETWORKS. PhD thesis, University of Toronto, 1995.
  41. Bayesian deep convolutional networks with many channels are gaussian processes. arXiv preprint arXiv:1810.05148, 2018.
  42. The promises and pitfalls of deep kernel learning. arXiv preprint arXiv:2102.12108, 2021.
  43. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc., 2019.
  44. The limitations of large width in neural networks: A deep gaussian process perspective. Advances in Neural Information Processing Systems, 34:3349–3363, 2021.
  45. The principles of deep learning theory. arXiv preprint arXiv:2106.10165, 2021.
  46. Inter-domain deep gaussian processes. In International Conference on Machine Learning, pp. 8286–8294. PMLR, 2020.
  47. Natural gradients in practice: Non-conjugate variational inference in gaussian process models. In AISTATS, 2018.
  48. Separation of scales and a thermodynamic description of feature learning in some cnns. Nature Communications, 14(1):908, 2023.
  49. Neural kernels without tangents. In International Conference on Machine Learning, pp. 8614–8623. PMLR, 2020.
  50. Sparse orthogonal variational inference for gaussian processes. In International Conference on Artificial Intelligence and Statistics, pp.  1932–1942. PMLR, 2020.
  51. Johan AK Suykens. Deep restricted kernel machines using conjugate feature duality. Neural computation, 29(8):2123–2163, 2017.
  52. Deep kernel principal component analysis for multi-level feature learning. arXiv preprint arXiv:2302.11220, 2023.
  53. Convolutional gaussian processes, 2017. URL https://arxiv.org/abs/1709.01894.
  54. Stochastic variational deep kernel learning. Advances in Neural Information Processing Systems, 29, 2016a.
  55. Deep kernel learning. In Artificial intelligence and statistics, pp.  370–378. PMLR, 2016b.
  56. Sho Yaida. Non-gaussian processes and neural networks at finite widths. In Mathematical and Scientific Machine Learning, pp. 165–192. PMLR, 2020.
  57. A theory of representation learning gives a deep generalisation of kernel methods. ICML, 2023.
  58. Feature learning in infinite-width neural networks. In International Conference on Machine Learning, 2021.
  59. Efficient computation of deep nonlinear infinite-width neural networks that learn features. In International Conference on Learning Representations, 2022.
  60. Exact marginal prior distributions of finite bayesian neural networks. Advances in Neural Information Processing Systems, 34, 2021.
  61. Asymptotics of representation learning in finite bayesian neural networks. Advances in neural information processing systems, 34:24765–24777, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Edward Milsom (6 papers)
  2. Ben Anson (7 papers)
  3. Laurence Aitchison (66 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets