Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Useful Representations of Recurrent Neural Network Weight Matrices (2403.11998v2)

Published 18 Mar 2024 in cs.LG

Abstract: Recurrent Neural Networks (RNNs) are general-purpose parallel-sequential computers. The program of an RNN is its weight matrix. How to learn useful representations of RNN weights that facilitate RNN analysis as well as downstream tasks? While the mechanistic approach directly looks at some RNN's weights to predict its behavior, the functionalist approach analyzes its overall functionality-specifically, its input-output mapping. We consider several mechanistic approaches for RNN weights and adapt the permutation equivariant Deep Weight Space layer for RNNs. Our two novel functionalist approaches extract information from RNN weights by 'interrogating' the RNN through probing inputs. We develop a theoretical framework that demonstrates conditions under which the functionalist approach can generate rich representations that help determine RNN behavior. We release the first two 'model zoo' datasets for RNN weight representation learning. One consists of generative models of a class of formal languages, and the other one of classifiers of sequentially processed MNIST digits.With the help of an emulation-based self-supervised learning technique we compare and evaluate the different RNN weight encoding techniques on multiple downstream applications. On the most challenging one, namely predicting which exact task the RNN was trained on, functionalist approaches show clear superiority.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. From data to functa: Your data point is a function and you can treat it like one. arXiv preprint arXiv:2201.12204, 2022.
  2. Classifying the classifier: dissecting the weight space of neural networks. arXiv preprint arXiv:2002.05688, 2020.
  3. Parameter-based value functions. Preprint arXiv:2006.09226, 2020.
  4. Goal-conditioned generators of deep policies. arXiv preprint arXiv:2207.01570, 2022a.
  5. General policy evaluation and improvement by learning to identify few but crucial states. arXiv preprint arXiv:2207.01566, 2022b.
  6. Policy evaluation networks. arXiv preprint arXiv:2002.11833, 2020.
  7. Hecht-Nielsen, R. On the algebraic structure of feedforward network weight spaces. In Advanced Neural Computers, pp.  129–135. Elsevier, 1990.
  8. Learning one abstract bit at a time through self-invented experiments encoded as neural networks. arXiv preprint arXiv:2212.14374, 2022.
  9. Long Short-Term Memory. Neural Computation, 9(8):1735–1780, 1997.
  10. Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural networks, 4(2):251–257, 1991.
  11. Evolving neural networks in compressed weight space. In Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, pp.  619–626, 2010.
  12. Evolving large-scale neural networks for vision-based reinforcement learning. In Proceedings of the 15th annual conference on Genetic and evolutionary computation, pp.  1061–1068, 2013.
  13. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  14. Equivariant architectures for learning in deep weight spaces. arXiv preprint arXiv:2301.12780, 2023.
  15. Fast adaptation via policy-dynamics value functions. arXiv preprint arXiv:2007.02879, 2020.
  16. Model zoo: A growing” brain” that learns continually. arXiv preprint arXiv:2106.03027, 2021.
  17. Schmidhuber, J. On learning to think: Algorithmic information theory for novel combinations of reinforcement learning controllers and recurrent neural world models. Preprint arXiv:1511.09249, 2015.
  18. Self-supervised representation learning on neural network weights for model characteristic prediction. Advances in Neural Information Processing Systems, 34:16481–16493, 2021.
  19. Model zoos: A dataset of diverse populations of neural network models. Advances in Neural Information Processing Systems, 35:38134–38148, 2022.
  20. Turing computability with neural nets. Applied Mathematics Letters, 4(6):77–80, 1991.
  21. Generalized compressed network search. In Proceedings of the fourteenth international conference on Genetic and evolutionary computation conference companion, GECCO Companion ’12, pp.  647–648, New York, NY, USA, 2012. ACM, ACM. ISBN 978-1-4503-1178-6. doi: 10.1145/2330784.2330902. URL http://doi.acm.org/10.1145/2330784.2330902.
  22. What about inputting policy in value function: Policy representation and policy-extended value function approximator. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  8441–8449, 2022.
  23. Predicting neural network accuracy from weights. ArXiv, abs/2002.11448, 2020. URL https://api.semanticscholar.org/CorpusID:211506753.
  24. Attention is all you need. In Advances in neural information processing systems, pp.  5998–6008, 2017.
  25. Signal processing for implicit neural representations. Advances in Neural Information Processing Systems, 35:13404–13418, 2022.
  26. Are transformers universal approximators of sequence-to-sequence functions? arXiv preprint arXiv:1912.10077, 2019.
  27. Socratic models: Composing zero-shot multimodal reasoning with language. arXiv preprint arXiv:2204.00598, 2022.
  28. Permutation equivariant neural functionals. arXiv preprint arXiv:2302.14040, 2023.
  29. Mindstorms in natural language-based societies of mind. arXiv preprint arXiv:2305.17066, 2023.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com