Block-Operations: Using Modular Routing to Improve Compositional Generalization (2408.00508v1)
Abstract: We explore the hypothesis that poor compositional generalization in neural networks is caused by difficulties with learning effective routing. To solve this problem, we propose the concept of block-operations, which is based on splitting all activation tensors in the network into uniformly sized blocks and using an inductive bias to encourage modular routing and modification of these blocks. Based on this concept we introduce the Multiplexer, a new architectural component that enhances the Feed Forward Neural Network (FNN). We experimentally confirm that Multiplexers exhibit strong compositional generalization. On both a synthetic and a realistic task our model was able to learn the underlying process behind the task, whereas both FNNs and Transformers were only able to learn heuristic approximations. We propose as future work to use the principles of block-operations to improve other existing architectures.
- Gary F Marcus. Rethinking eliminative connectionism. Cognitive psychology, 37(3):243–282, 1998.
- Systematic generalization: What is required and can it be learned? arXiv preprint arXiv:1811.12889, 2018.
- Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In International conference on machine learning, pages 2873–2882. PMLR, 2018.
- Modular deep learning. arXiv preprint arXiv:2302.11529, 2023.
- Measuring abstract reasoning in neural networks. In International conference on machine learning, pages 511–520. PMLR, 2018.
- Compositionality decomposed: How do neural networks generalise? Journal of Artificial Intelligence Research, 67:757–795, 2020.
- Environmental drivers of systematicity and generalization in a situated agent. arXiv preprint arXiv:1910.00571, 2019.
- On the binding problem in artificial neural networks. arXiv preprint arXiv:2012.05208, 2020.
- Are neural nets modular? inspecting functional modularity through differentiable weight masks. arXiv preprint arXiv:2010.02066, 2020.
- Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.
- Identity mappings in deep residual networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pages 630–645. Springer, 2016a.
- Residual connections encourage iterative inference. arXiv preprint arXiv:1710.04773, 2017.
- Rezero is all you need: Fast convergence at large depth. In Uncertainty in Artificial Intelligence, pages 1352–1361. PMLR, 2021.
- Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognition, 90:119–133, 2019.
- Why resnet works? residuals generalize. IEEE transactions on neural networks and learning systems, 31(12):5349–5362, 2020.
- The neural data router: Adaptive control flow in transformers improves systematic generalization. arXiv preprint arXiv:2110.07732, 2021.
- Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
- Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning, pages 2048–2057. PMLR, 2015.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Spatial transformer networks. Advances in neural information processing systems, 28, 2015.
- Attention-based models for speech recognition. Advances in neural information processing systems, 28, 2015.
- Dynamic neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7436–7456, 2021.
- Deciding how to decide: Dynamic routing in artificial neural networks. In International Conference on Machine Learning, pages 2363–2372. PMLR, 2017.
- Recurrent independent mechanisms. arXiv preprint arXiv:1909.10893, 2019.
- Dynamic routing between capsules. Advances in neural information processing systems, 30, 2017.
- Object-centric learning with slot attention. Advances in Neural Information Processing Systems, 33:11525–11538, 2020.
- Transformers with competitive ensembles of independent mechanisms. ArXiv, abs/2103.00336, 2021. URL https://api.semanticscholar.org/CorpusID:232075605.
- Christoph Von Der Malsburg. Am i thinking assemblies? In Brain Theory: Proceedings of the First Trieste Meeting on Brain Theory, October 1–4, 1984, pages 161–176. Springer, 1986.
- The validity of evaluation results: Assessing concurrence across compositionality benchmarks. arXiv preprint arXiv:2310.17514, 2023.
- Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pages 109–165. Elsevier, 1989.
- Grokking: Generalization beyond overfitting on small algorithmic datasets. arXiv preprint arXiv:2201.02177, 2022.
- Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144, 2016.
- Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
- An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211, 2013.
- Making transformers solve compositional tasks. arXiv preprint arXiv:2108.04378, 2021.
- Bert rediscovers the classical nlp pipeline. arXiv preprint arXiv:1905.05950, 2019.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016b.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.