Graph Metanetworks for Processing Diverse Neural Architectures (2312.04501v2)
Abstract: Neural networks efficiently encode learned information within their parameters. Consequently, many tasks can be unified by treating neural networks themselves as input data. When doing so, recent studies demonstrated the importance of accounting for the symmetries and geometry of parameter spaces. However, those works developed architectures tailored to specific networks such as MLPs and CNNs without normalization layers, and generalizing such architectures to other types of networks can be challenging. In this work, we overcome these challenges by building new metanetworks - neural networks that take weights from other neural networks as input. Put simply, we carefully build graphs representing the input neural networks and process the graphs using graph neural networks. Our approach, Graph Metanetworks (GMNs), generalizes to neural architectures where competing methods struggle, such as multi-head attention layers, normalization layers, convolutional layers, ResNet blocks, and group-equivariant linear layers. We prove that GMNs are expressive and equivariant to parameter permutation symmetries that leave the input neural network functions unchanged. We validate the effectiveness of our method on several metanetwork tasks over diverse neural network architectures.
- Git re-basin: Merging models modulo permutation symmetries. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=CQsmMYmlP5T.
- Set-based neural network encoding. arXiv preprint arXiv:2305.16625, 2023.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018.
- Spatial functa: Scaling functa to imagenet classification and generation. arXiv preprint arXiv:2302.03130, 2023.
- A framework for the cooperation of learning algorithms. Advances in neural information processing systems, 3, 1990.
- Neural processing of tri-plane hybrid neural fields. arXiv preprint arXiv:2310.01140, 2023.
- Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16123–16133, 2022.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp. 1597–1607. PMLR, 2020.
- Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017.
- Relational attention: Generalizing transformers for graph-structured tasks. In The Eleventh International Conference on Learning Representations, 2023.
- Sharp minima can generalize for deep nets. In International Conference on Machine Learning, pp. 1019–1028. PMLR, 2017.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=YicbFdNTTy.
- From data to functa: Your data point is a function and you can treat it like one. International Conference on Machine Learning (ICML), 2022.
- Classifying the classifier: dissecting the weight space of neural networks. Proceedings of the European Conference on Artificial Intelligence, 2020.
- The role of permutation invariance in linear mode connectivity of neural networks. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=dNigytemkL.
- Hyperdiffusion: Generating implicit neural fields with weight-space diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14300–14310, October 2023.
- Locally constrained graph homomorphisms—structure, complexity, and applications. Computer Science Review, 2(2):97–111, 2008.
- A practical method for constructing equivariant multilayer perceptrons for arbitrary matrix groups. In International conference on machine learning, pp. 3318–3328. PMLR, 2021.
- A mathematical model for feed-forward neural networks : theoretical description and parallel applications. Research Report LIP RR-1995-23, Laboratoire de l’informatique du parallélisme, September 1995. URL https://hal-lara.archives-ouvertes.fr/hal-02101945.
- Neural message passing for quantum chemistry. In International conference on machine learning, pp. 1263–1272. PMLR, 2017.
- On the symmetries of deep learning models and their internal representations. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022.
- William L Hamilton. Graph representation learning. Morgan & Claypool Publishers, 2020.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Robert Hecht-Nielsen. On the algebraic structure of feedforward network weight spaces. In Advanced Neural Computers, pp. 129–135. Elsevier, 1990.
- Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural networks, 4(2):251–257, 1991.
- Ogb-lsc: A large-scale challenge for machine learning on graphs. arXiv preprint arXiv:2103.09430, 2021.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pp. 448–456. pmlr, 2015.
- Transformers generalize deepsets and can be extended to graphs & hypergraphs. Advances in Neural Information Processing Systems, 34:28016–28028, 2021.
- Pure transformers are powerful graph learners. Advances in Neural Information Processing Systems, 35:14582–14595, 2022a.
- Equivariant hypergraph neural networks. In European Conference on Computer Vision, pp. 86–103. Springer, 2022b.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Parameter prediction for unseen deep architectures. Advances in Neural Information Processing Systems, 34:29433–29448, 2021.
- Alex Krizhevsky. Learning multiple layers of features from tiny images. 2009. URL https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf.
- Ffcv: Accelerating training by removing data bottlenecks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12011–12020, 2023.
- Handwritten digit recognition with a back-propagation network. In D. Touretzky (ed.), Advances in Neural Information Processing Systems, volume 2. Morgan-Kaufmann, 1989. URL https://proceedings.neurips.cc/paper_files/paper/1989/file/53c3bce66e43be4f209556518c2fcb54-Paper.pdf.
- Sign and basis invariant networks for spectral graph representation learning. In The Eleventh International Conference on Learning Representations, 2023.
- Federated learning with heterogeneous architectures using graph hypernetworks. arXiv preprint arXiv:2201.08459, 2022.
- DARTS: Differentiable architecture search. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=S1eYHoC5FX.
- Deep learning on implicit neural representations of shapes. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=OoOIW-3uadi.
- Graph inductive biases in transformers without message passing. In ICML, 2023.
- Invariant and equivariant graph networks. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=Syx72jC9tm.
- Velo: Training versatile learned optimizers by scaling up. arXiv preprint arXiv:2211.09760, 2022.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
- Equivariant architectures for learning in deep weight spaces. International Conference on Learning Representations, 2023.
- Automatic differentiation in pytorch. 2017.
- Learning to learn with generative models of neural network checkpoints. arXiv preprint arXiv:2209.12892, 2022.
- Recipe for a general, powerful, scalable graph transformer. Advances in Neural Information Processing Systems, 35:14501–14515, 2022.
- Equivariance through parameter-sharing. In International conference on machine learning, pp. 2892–2901. PMLR, 2017.
- Model zoos: A dataset of diverse populations of neural network models. Advances in Neural Information Processing Systems, 35:38134–38148, 2022.
- Neural geometric level of detail: Real-time rendering with implicit 3D shapes. 2021.
- Directed acyclic graph neural networks. In International Conference on Learning Representations, 2021.
- Predicting neural network accuracy from weights. arXiv preprint arXiv:2002.11448, 2020.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Digress: Discrete denoising diffusion for graph generation. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=UaAD-Nu86WX.
- Group normalization. In Proceedings of the European conference on computer vision (ECCV), pp. 3–19, 2018.
- Exploring randomly wired neural networks for image recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1284–1293, 2019.
- Graph structure of neural networks. In International Conference on Machine Learning, pp. 10881–10891. PMLR, 2020.
- Deep sets. Advances in neural information processing systems, 30, 2017.
- Neural networks are graphs! graph neural networks for equivariant processing of neural networks. In 2nd Annual Topology, Algebra, and Geometry in Machine Learning Workshop, 2023.
- D-vae: A variational autoencoder for directed acyclic graphs. Advances in Neural Information Processing Systems, 32, 2019.
- Permutation equivariant neural functionals. Advances in Neural Information Processing Systems, 2023a.
- Neural functional transformers. arXiv preprint arXiv:2305.13546, 2023b.