Survival of the Fittest Representation: A Case Study with Modular Addition (2405.17420v1)
Abstract: When a neural network can learn multiple distinct algorithms to solve a task, how does it "choose" between them during training? To approach this question, we take inspiration from ecology: when multiple species coexist, they eventually reach an equilibrium where some survive while others die out. Analogously, we suggest that a neural network at initialization contains many solutions (representations and algorithms), which compete with each other under pressure from resource constraints, with the "fittest" ultimately prevailing. To investigate this Survival of the Fittest hypothesis, we conduct a case study on neural networks performing modular addition, and find that these networks' multiple circular representations at different Fourier frequencies undergo such competitive dynamics, with only a few circles surviving at the end. We find that the frequencies with high initial signals and gradients, the "fittest," are more likely to survive. By increasing the embedding dimension, we also observe more surviving frequencies. Inspired by the Lotka-Volterra equations describing the dynamics between species, we find that the dynamics of the circles can be nicely characterized by a set of linear differential equations. Our results with modular addition show that it is possible to decompose complicated representations into simpler components, along with their basic interactions, to offer insight on the training dynamics of representations.
- Towards understanding grokking: An effective theory of representation learning. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=6at6rB3IZm.
- Representation engineering: A top-down approach to ai transparency, 2023.
- Sparse autoencoders find highly interpretable features in language models. arXiv preprint arXiv:2309.08600, 2023.
- Towards monosemanticity: Decomposing language models with dictionary learning. Transformer Circuits Thread, 2023. https://transformer-circuits.pub/2023/monosemantic-features/index.html.
- Language models represent space and time. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=jE8xbmvFin.
- The geometry of truth: Emergent linear structure in large language model representations of true/false datasets. arXiv preprint arXiv:2310.06824, 2023.
- In-context learning and induction heads. Transformer Circuits Thread, 2022. https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html.
- Progress measures for grokking via mechanistic interpretability. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=9XFSbDPmdW.
- Zoom in: An introduction to circuits. Distill, 5(3):e00024–001, 2020.
- A mathematical framework for transformer circuits. Transformer Circuits Thread, 2021. https://transformer-circuits.pub/2021/framework/index.html.
- Sparse feature circuits: Discovering and editing interpretable causal graphs in language models. arXiv preprint arXiv:2403.19647, 2024.
- The developmental landscape of in-context learning. arXiv preprint arXiv:2402.02364, 2024.
- Sudden drops in the loss: Syntax acquisition, phase transitions, and simplicity bias in mlms. arXiv preprint arXiv:2309.07311, 2023.
- What needs to go right for an induction head? a mechanistic study of in-context learning circuits and their formation, 2024.
- Successor heads: Recurring, interpretable attention heads in the wild. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=kvcbV8KQsi.
- Universal neurons in gpt2 language models. arXiv preprint arXiv:2401.12181, 2024.
- The clock and the pizza: Two stories in mechanistic explanation of neural networks. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=S5wmbQc1We.
- Berts of a feather do not generalize together: Large variability in generalization across models with similar test set performance. arXiv preprint arXiv:1911.02969, 2019.
- Learned feature representations are biased by complexity, learning order, position, and more. arXiv preprint arXiv:2405.05847, 2024.
- Omnigrok: Grokking beyond algorithmic data. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=zDiHoIWa0q1.
- Grokking: Generalization beyond overfitting on small algorithmic datasets, 2022.
- Uri Alon. An introduction to systems biology: design principles of biological circuits. Chapman and Hall/CRC, 2019.
- A neural scaling law from the dimension of the data manifold. arXiv preprint arXiv:2004.10802, 2020.
- The quantization model of neural scaling. Advances in Neural Information Processing Systems, 36, 2024.
- A resource model for neural scaling law. arXiv preprint arXiv:2402.05164, 2024.
- Deep double descent: Where bigger models and more data hurt. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=B1g5sA4twr.
- Disentangling feature and lazy training in deep neural networks. Journal of Statistical Mechanics: Theory and Experiment, 2020(11):113301, nov 2020. doi: 10.1088/1742-5468/abc4de. URL https://dx.doi.org/10.1088/1742-5468/abc4de.
- The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=rJl-b3RcF7.
- Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems, 31, 2018.
- Generating interpretable networks using hypernetworks, 2023.
- A toy model of universality: reverse engineering how networks learn group operations. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023.
- Grokking group multiplication with cosets, 2023.
- Understanding addition in transformers, 2024.
- Increasing trust in language models through the reuse of verified circuits, 2024.
- Hidden progress in deep learning: Sgd learns parities near the computational limit, 2023.
- The slingshot mechanism: An empirical study of adaptive optimizers and the grokking phenomenon, 2022.
- Regularization-wise double descent: Why it occurs and how to eliminate it. In 2022 IEEE International Symposium on Information Theory (ISIT), page 426–431. IEEE Press, 2022. doi: 10.1109/ISIT50566.2022.9834569. URL https://doi.org/10.1109/ISIT50566.2022.9834569.
- Double descent demystified: Identifying, interpreting & ablating the sources of a deep learning puzzle. In The Third Blogpost Track at ICLR 2024, 2024. URL https://openreview.net/forum?id=muC7uLvGHr.
- Unifying grokking and double descent. arXiv preprint arXiv:2303.06173, 2023.
- Training dynamics of contextual n-grams in language models. arXiv preprint arXiv:2311.00863, 2023.
- Geneft: Understanding statics and dynamics of model generalization via effective theory. arXiv preprint arXiv:2402.05916, 2024.
- Loek van Rossem and Andrew M Saxe. When representations align: Universality in representation learning dynamics. arXiv preprint arXiv:2402.09142, 2024.
- Latent state models of training dynamics. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https://openreview.net/forum?id=NE2xXWo0LF.
- The platonic representation hypothesis. arXiv preprint arXiv:2405.07987, 2024.
- Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013.
- Contrastive representation learning: A framework and review. Ieee Access, 8:193907–193934, 2020.
- Zhi-Hua Zhou. A brief introduction to weakly supervised learning. National science review, 5(1):44–53, 2018.
- A survey on contrastive self-supervised learning. Technologies, 9(1):2, 2020.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
- Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15750–15758, 2021.
- Redundant representations help generalization in wide neural networks. Journal of Statistical Mechanics: Theory and Experiment, 2023(11):114011, 2023.
- Not all language model features are linear. arXiv preprint arXiv:2405.14860, 2024.
- Deconstructing lottery tickets: Zeros, signs, and the supermask. In Advances in Neural Information Processing Systems, 2019.
- What’s hidden in a randomly weighted neural network? In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11890–11899, 2020. doi: 10.1109/CVPR42600.2020.01191.
- Proving the lottery ticket hypothesis: Pruning is all you need. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 6682–6691. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/malach20a.html.
- Proving the lottery ticket hypothesis for convolutional neural networks. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=Vjki79-619-.
- Logarithmic pruning is all you need. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546.
- Optimal lottery tickets via subsetsum: logarithmic over-parameterization is sufficient. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546.
- Multi-prize lottery ticket hypothesis: Finding accurate binary neural networks by pruning a randomly weighted network. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=U_mat0b9iv.
- One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers. Curran Associates Inc., Red Hook, NY, USA, 2019.
- The elastic lottery ticket hypothesis. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 26609–26621. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/dfccdb8b1cc7e4dab6d33db0fef12b88-Paper.pdf.
- Stabilizing the lottery ticket hypothesis, 2020.
- SNIP: SINGLE-SHOT NETWORK PRUNING BASED ON CONNECTION SENSITIVITY. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=B1VZqjAcYX.
- Pruning neural networks at initialization: Why are we missing the mark? In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=Ig-VyQc-MLK.
- Picking winning tickets before training by preserving gradient flow. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=SkgsACVKPH.
- Pruning neural networks without any data by iteratively conserving synaptic flow. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 6377–6389. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/46a4378f835dc8040c8057beb6a2da52-Paper.pdf.
- Analyzing redundancy in pretrained transformer models. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4908–4926, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.398. URL https://aclanthology.org/2020.emnlp-main.398.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=Bkg6RiCqY7.