Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Graph Metanetworks for Processing Diverse Neural Architectures (2312.04501v2)

Published 7 Dec 2023 in cs.LG, cs.AI, and stat.ML

Abstract: Neural networks efficiently encode learned information within their parameters. Consequently, many tasks can be unified by treating neural networks themselves as input data. When doing so, recent studies demonstrated the importance of accounting for the symmetries and geometry of parameter spaces. However, those works developed architectures tailored to specific networks such as MLPs and CNNs without normalization layers, and generalizing such architectures to other types of networks can be challenging. In this work, we overcome these challenges by building new metanetworks - neural networks that take weights from other neural networks as input. Put simply, we carefully build graphs representing the input neural networks and process the graphs using graph neural networks. Our approach, Graph Metanetworks (GMNs), generalizes to neural architectures where competing methods struggle, such as multi-head attention layers, normalization layers, convolutional layers, ResNet blocks, and group-equivariant linear layers. We prove that GMNs are expressive and equivariant to parameter permutation symmetries that leave the input neural network functions unchanged. We validate the effectiveness of our method on several metanetwork tasks over diverse neural network architectures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Git re-basin: Merging models modulo permutation symmetries. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=CQsmMYmlP5T.
  2. Set-based neural network encoding. arXiv preprint arXiv:2305.16625, 2023.
  3. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  4. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018.
  5. Spatial functa: Scaling functa to imagenet classification and generation. arXiv preprint arXiv:2302.03130, 2023.
  6. A framework for the cooperation of learning algorithms. Advances in neural information processing systems, 3, 1990.
  7. Neural processing of tri-plane hybrid neural fields. arXiv preprint arXiv:2310.01140, 2023.
  8. Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  16123–16133, 2022.
  9. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp. 1597–1607. PMLR, 2020.
  10. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017.
  11. Relational attention: Generalizing transformers for graph-structured tasks. In The Eleventh International Conference on Learning Representations, 2023.
  12. Sharp minima can generalize for deep nets. In International Conference on Machine Learning, pp. 1019–1028. PMLR, 2017.
  13. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=YicbFdNTTy.
  14. From data to functa: Your data point is a function and you can treat it like one. International Conference on Machine Learning (ICML), 2022.
  15. Classifying the classifier: dissecting the weight space of neural networks. Proceedings of the European Conference on Artificial Intelligence, 2020.
  16. The role of permutation invariance in linear mode connectivity of neural networks. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=dNigytemkL.
  17. Hyperdiffusion: Generating implicit neural fields with weight-space diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.  14300–14310, October 2023.
  18. Locally constrained graph homomorphisms—structure, complexity, and applications. Computer Science Review, 2(2):97–111, 2008.
  19. A practical method for constructing equivariant multilayer perceptrons for arbitrary matrix groups. In International conference on machine learning, pp. 3318–3328. PMLR, 2021.
  20. A mathematical model for feed-forward neural networks : theoretical description and parallel applications. Research Report LIP RR-1995-23, Laboratoire de l’informatique du parallélisme, September 1995. URL https://hal-lara.archives-ouvertes.fr/hal-02101945.
  21. Neural message passing for quantum chemistry. In International conference on machine learning, pp. 1263–1272. PMLR, 2017.
  22. On the symmetries of deep learning models and their internal representations. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022.
  23. William L Hamilton. Graph representation learning. Morgan & Claypool Publishers, 2020.
  24. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  25. Robert Hecht-Nielsen. On the algebraic structure of feedforward network weight spaces. In Advanced Neural Computers, pp.  129–135. Elsevier, 1990.
  26. Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural networks, 4(2):251–257, 1991.
  27. Ogb-lsc: A large-scale challenge for machine learning on graphs. arXiv preprint arXiv:2103.09430, 2021.
  28. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pp. 448–456. pmlr, 2015.
  29. Transformers generalize deepsets and can be extended to graphs & hypergraphs. Advances in Neural Information Processing Systems, 34:28016–28028, 2021.
  30. Pure transformers are powerful graph learners. Advances in Neural Information Processing Systems, 35:14582–14595, 2022a.
  31. Equivariant hypergraph neural networks. In European Conference on Computer Vision, pp.  86–103. Springer, 2022b.
  32. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  33. Parameter prediction for unseen deep architectures. Advances in Neural Information Processing Systems, 34:29433–29448, 2021.
  34. Alex Krizhevsky. Learning multiple layers of features from tiny images. 2009. URL https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf.
  35. Ffcv: Accelerating training by removing data bottlenecks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  12011–12020, 2023.
  36. Handwritten digit recognition with a back-propagation network. In D. Touretzky (ed.), Advances in Neural Information Processing Systems, volume 2. Morgan-Kaufmann, 1989. URL https://proceedings.neurips.cc/paper_files/paper/1989/file/53c3bce66e43be4f209556518c2fcb54-Paper.pdf.
  37. Sign and basis invariant networks for spectral graph representation learning. In The Eleventh International Conference on Learning Representations, 2023.
  38. Federated learning with heterogeneous architectures using graph hypernetworks. arXiv preprint arXiv:2201.08459, 2022.
  39. DARTS: Differentiable architecture search. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=S1eYHoC5FX.
  40. Deep learning on implicit neural representations of shapes. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=OoOIW-3uadi.
  41. Graph inductive biases in transformers without message passing. In ICML, 2023.
  42. Invariant and equivariant graph networks. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=Syx72jC9tm.
  43. Velo: Training versatile learned optimizers by scaling up. arXiv preprint arXiv:2211.09760, 2022.
  44. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
  45. Equivariant architectures for learning in deep weight spaces. International Conference on Learning Representations, 2023.
  46. Automatic differentiation in pytorch. 2017.
  47. Learning to learn with generative models of neural network checkpoints. arXiv preprint arXiv:2209.12892, 2022.
  48. Recipe for a general, powerful, scalable graph transformer. Advances in Neural Information Processing Systems, 35:14501–14515, 2022.
  49. Equivariance through parameter-sharing. In International conference on machine learning, pp. 2892–2901. PMLR, 2017.
  50. Model zoos: A dataset of diverse populations of neural network models. Advances in Neural Information Processing Systems, 35:38134–38148, 2022.
  51. Neural geometric level of detail: Real-time rendering with implicit 3D shapes. 2021.
  52. Directed acyclic graph neural networks. In International Conference on Learning Representations, 2021.
  53. Predicting neural network accuracy from weights. arXiv preprint arXiv:2002.11448, 2020.
  54. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  55. Digress: Discrete denoising diffusion for graph generation. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=UaAD-Nu86WX.
  56. Group normalization. In Proceedings of the European conference on computer vision (ECCV), pp.  3–19, 2018.
  57. Exploring randomly wired neural networks for image recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  1284–1293, 2019.
  58. Graph structure of neural networks. In International Conference on Machine Learning, pp. 10881–10891. PMLR, 2020.
  59. Deep sets. Advances in neural information processing systems, 30, 2017.
  60. Neural networks are graphs! graph neural networks for equivariant processing of neural networks. In 2nd Annual Topology, Algebra, and Geometry in Machine Learning Workshop, 2023.
  61. D-vae: A variational autoencoder for directed acyclic graphs. Advances in Neural Information Processing Systems, 32, 2019.
  62. Permutation equivariant neural functionals. Advances in Neural Information Processing Systems, 2023a.
  63. Neural functional transformers. arXiv preprint arXiv:2305.13546, 2023b.
Citations (19)

Summary

  • The paper introduces Graph Metanetworks that extend processing from MLPs and CNNs to complex architectures using graph representations.
  • It proves GMNs’ equivariance and expressivity to parameter permutation symmetries, ensuring robust network predictions.
  • Empirical tests on diverse classifiers show GMNs outperform conventional methods, enabling unified and scalable neural architecture analysis.

Overview of Graph Metanetworks for Processing Diverse Neural Architectures

The paper "Graph Metanetworks for Processing Diverse Neural Architectures" by Lim et al. introduces an approach called Graph Metanetworks (GMNs) for processing diverse neural architectures. The approach leverages graph neural networks (GNNs) to handle neural architectures as data, addressing challenges in representing diverse neural network parameters equitably. This research offers significant theoretical and empirical insights into metanetwork design and performance.

Key Contributions

  1. Generalization to Diverse Architectures: The GMNs are designed to generalize beyond simple architectures such as MLPs and CNNs, extending compatibility to more complex structures like multi-head attention layers, normalization layers, and residual networks. The authors meticulously construct graphs for these diverse architectures and process them through GNNs.
  2. Equivariance and Expressivity: A major theoretical contribution is the proof of GMNs' expressivity and equivariance to parameter permutation symmetries. The authors utilize neural DAG automorphisms to show that GMNs respect the inherent symmetries of neural network parameter spaces. This generalizes known results for simpler architectures and is pivotal for ensuring accurate metanetwork predictions and transformations.
  3. Empirical Validation: The paper provides a comprehensive evaluation of GMNs on various metanetwork tasks using datasets of image classifiers from architectures including CNNs, DeepSets, ResNets, and Vision Transformers. The GMNs outperform existing metanetwork methods, showing robust generalization in predicting network accuracy across disparate architectures and layer configurations.

Implications and Future Direction

This work has several practical and theoretical implications:

  • Unified Framework for Neural Architecture Processing: By converting neural network architectures into graphs, the proposed framework offers a unified methodology for analyzing and manipulating networks, paving the way for more generalized and adaptable metanetwork applications.
  • Scalability and Complexity Handling: The efficiency of GMNs in handling parameter-sharing layers (e.g., convolutions in CNNs and attention mechanisms) without substantial computational overhead suggests potential scalability to larger networks. However, the extension to billion-parameter networks remains an open challenge.
  • Expanding Theory to Parameter Graphs: The diversity in parameter sharing techniques across modern architectures suggests further theoretical exploration in parameter graph design. This could extend the existing theory of neural DAG automorphisms beyond computation graphs, solidifying the general applicability of GMNs.
  • Applications in Neural Network Analysis and Optimization: GMNs could be instrumental in advanced applications such as neural architecture search, federated learning with heterogenous architectures, and potentially augmenting neural architecture optimization tasks.

Overall, this paper introduces a significant advancement in metanetwork science, providing a robust method for neural architecture analysis and manipulation that is both theoretically grounded and empirically validated. The adoption of graph-based methods for neural networks could lead to new insights and capabilities in neural network design and efficiency.