Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Layer-wise Equivariances Automatically using Gradients (2310.06131v1)

Published 9 Oct 2023 in cs.LG, cs.AI, and stat.ML

Abstract: Convolutions encode equivariance symmetries into neural networks leading to better generalisation performance. However, symmetries provide fixed hard constraints on the functions a network can represent, need to be specified in advance, and can not be adapted. Our goal is to allow flexible symmetry constraints that can automatically be learned from data using gradients. Learning symmetry and associated weight connectivity structures from scratch is difficult for two reasons. First, it requires efficient and flexible parameterisations of layer-wise equivariances. Secondly, symmetries act as constraints and are therefore not encouraged by training losses measuring data fit. To overcome these challenges, we improve parameterisations of soft equivariance and learn the amount of equivariance in layers by optimising the marginal likelihood, estimated using differentiable Laplace approximations. The objective balances data fit and model complexity enabling layer-wise symmetry discovery in deep networks. We demonstrate the ability to automatically learn layer-wise equivariances on image classification tasks, achieving equivalent or improved performance over baselines with hard-coded symmetry.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Depth uncertainty in neural networks. Advances in neural information processing systems, 33:10620–10634, 2020.
  2. Stationary kernels and gaussian processes on lie groups and their homogeneous spaces i: the compact case. arXiv preprint arXiv:2208.14960, 2022.
  3. Erik J Bekkers. B-spline cnns on lie groups. arXiv preprint arXiv:1909.12057, 2019.
  4. Learning invariances in neural networks. arXiv preprint arXiv:2010.11882, 2020.
  5. Optimization methods for large-scale machine learning. SIAM review, 60(2):223–311, 2018.
  6. Truly shift-invariant convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3773–3783, 2021.
  7. François Chollet. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1251–1258, 2017.
  8. Learning the irreducible representations of commutative lie groups. In International Conference on Machine Learning, pages 1755–1763. PMLR, 2014.
  9. Group equivariant convolutional networks. In International conference on machine learning, pages 2990–2999. PMLR, 2016.
  10. Intertwiners between induced representations (with applications to the theory of equivariant neural networks). arXiv preprint arXiv:1803.10743, 2018.
  11. A general theory of equivariant cnns on homogeneous spaces. Advances in neural information processing systems, 32, 2019.
  12. Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501, 2018.
  13. Automatic symmetry discovery with lie algebra convolutional network. Advances in Neural Information Processing Systems, 34:2503–2515, 2021.
  14. Bayesian image classification with deep convolutional gaussian processes. In International Conference on Artificial Intelligence and Statistics, pages 1529–1539. PMLR, 2020.
  15. Residual pathway priors for soft equivariance constraints. Advances in Neural Information Processing Systems, 34, 2021a.
  16. A practical method for constructing equivariant multilayer perceptrons for arbitrary matrix groups. In International Conference on Machine Learning, pages 3318–3328. PMLR, 2021b.
  17. Bayesian neural network priors revisited. arXiv preprint arXiv:2102.06571, 2021.
  18. A kronecker-factored approximate fisher matrix for convolution layers. In International Conference on Machine Learning, pages 573–582. PMLR, 2016.
  19. Matrix capsules with em routing. In International conference on learning representations, 2018.
  20. Scalable marginal likelihood estimation for model selection in deep learning. In International Conference on Machine Learning, pages 4563–4573. PMLR, 2021.
  21. Invariance learning in deep neural networks with differentiable laplace approximations, 2022.
  22. Stochastic marginal likelihood gradients using neural tangent kernels. In International Conference on Machine Learning, pages 14333–14352. PMLR, 2023.
  23. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  24. Exploiting redundancy: Separable group convolutional networks on lie groups. In International Conference on Machine Learning, pages 11359–11386. PMLR, 2022.
  25. On the generalization of equivariance and convolution in neural networks to the action of compact groups. In International Conference on Machine Learning, pages 2747–2755. PMLR, 2018.
  26. Stacked capsule autoencoders. Advances in neural information processing systems, 32, 2019.
  27. Learning multiple layers of features from tiny images. 2009.
  28. Yann LeCun. The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/, 1998.
  29. Handwritten digit recognition with a back-propagation network. Advances in neural information processing systems, 2, 1989.
  30. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  31. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018a.
  32. An intriguing failing of convolutional neural networks and the coordconv solution. arXiv preprint arXiv:1807.03247, 2018b.
  33. Optimizing millions of hyperparameters by implicit differentiation. In Silvia Chiappa and Roberto Calandra, editors, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pages 1540–1552. PMLR, 26–28 Aug 2020.
  34. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
  35. David JC MacKay. Bayesian interpolation. Neural computation, 4(3):415–447, 1992.
  36. David JC MacKay. Information theory, inference and learning algorithms. Cambridge university press, 2003.
  37. David JC MacKay et al. Bayesian nonlinear modeling for the prediction competition. ASHRAE transactions, 100(2):1053–1062, 1994.
  38. Architectural optimization over subgroups for equivariant neural networks. arXiv preprint arXiv:2210.05484, 2022.
  39. James Martens. New insights and perspectives on the natural gradient method. The Journal of Machine Learning Research, 21(1):5776–5851, 2020.
  40. Optimizing neural networks with kronecker-factored approximate curvature. In International conference on machine learning, pages 2408–2417, 2015.
  41. Hyperparameter optimization through neural network partitioning. arXiv preprint arXiv:2304.14766, 2023.
  42. Behnam Neyshabur. Towards learning convolutions from scratch. Advances in Neural Information Processing Systems, 33:8078–8088, 2020.
  43. Global inducing point variational posteriors for bayesian neural networks and deep gaussian processes. In International Conference on Machine Learning, pages 8248–8259. PMLR, 2021.
  44. Occam’s razor. Advances in neural information processing systems, 13, 2000.
  45. Carl Edward Rasmussen. Gaussian processes in machine learning. In Summer school on machine learning, pages 63–71. Springer, 2003.
  46. Learning equivariances and partial equivariances from data. arXiv preprint arXiv:2110.10211, 2021.
  47. Dnarch: Learning convolutional neural architectures by backpropagation. arXiv preprint arXiv:2302.05400, 2023.
  48. Dynamic routing between capsules. Advances in neural information processing systems, 30, 2017.
  49. Last layer marginal likelihood for invariance learning. arXiv preprint arXiv:2106.07512, 2021.
  50. Tycho FA van der Ouderaa and Mark van der Wilk. Learning invariant weights in neural networks. In Workshop in Uncertainty & Robustness in Deep Learning, ICML, 2021.
  51. Tycho FA van der Ouderaa and Mark van der Wilk. Sparse convolutions on lie groups. In NeurIPS Workshop on Symmetry and Geometry in Neural Representations, pages 48–62. PMLR, 2023.
  52. Relaxing equivariance constraints with non-stationary continuous filters. arXiv preprint arXiv:2204.07178, 2022.
  53. Mdp homomorphic networks: Group symmetries in reinforcement learning. Advances in Neural Information Processing Systems, 33:4199–4210, 2020.
  54. Learning invariances using the marginal likelihood. arXiv preprint arXiv:1808.05563, 2018.
  55. Rotation equivariant cnns for digital pathology. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part II 11, pages 210–218. Springer, 2018.
  56. Incorporating symmetry into deep dynamics models for improved generalization. arXiv preprint arXiv:2002.03061, 2020.
  57. Approximately equivariant networks for imperfectly symmetric dynamics. arXiv preprint arXiv:2201.11969, 2022.
  58. General e (2)-equivariant steerable cnns. Advances in Neural Information Processing Systems, 32, 2019.
  59. 3d steerable cnns: Learning rotationally equivariant features in volumetric data. arXiv preprint arXiv:1807.02547, 2018.
  60. Harmonic networks: Deep translation and rotation equivariance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5028–5037, 2017.
  61. Group equivariant subsampling. Advances in Neural Information Processing Systems, 34:5934–5946, 2021.
  62. Generative adversarial symmetry discovery. arXiv preprint arXiv:2302.00236, 2023.
  63. Equivariance discovery by learned parameter-sharing. In International Conference on Artificial Intelligence and Statistics, pages 1527–1545. PMLR, 2022.
  64. Deep sets. arXiv preprint arXiv:1703.06114, 2017.
  65. Meta-learning symmetries by reparameterization. arXiv preprint arXiv:2007.02933, 2020.
  66. Bayesnas: A bayesian approach for neural architecture search. In International conference on machine learning, pages 7603–7613. PMLR, 2019.
  67. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8697–8710, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Tycho F. A. van der Ouderaa (12 papers)
  2. Alexander Immer (26 papers)
  3. Mark van der Wilk (61 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.