On Logical Extrapolation for Mazes with Recurrent and Implicit Networks
Abstract: Recent work has suggested that certain neural network architectures-particularly recurrent neural networks (RNNs) and implicit neural networks (INNs) are capable of logical extrapolation. That is, one may train such a network on easy instances of a specific task and then apply it successfully to more difficult instances of the same task. In this paper, we revisit this idea and show that (i) The capacity for extrapolation is less robust than previously suggested. Specifically, in the context of a maze-solving task, we show that while INNs (and some RNNs) are capable of generalizing to larger maze instances, they fail to generalize along axes of difficulty other than maze size. (ii) Models that are explicitly trained to converge to a fixed point (e.g. the INN we test) are likely to do so when extrapolating, while models that are not (e.g. the RNN we test) may exhibit more exotic limiting behaviour such as limit cycles, even when they correctly solve the problem. Our results suggest that (i) further study into why such networks extrapolate easily along certain axes of difficulty yet struggle with others is necessary, and (ii) analyzing the dynamics of extrapolation may yield insights into designing more efficient and interpretable logical extrapolators.
- Path Independent Equilibrium Models Can Better Exploit Test-Time Computation. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 7796–7809. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/331c41353b053683e17f7c88a797701d-Paper-Conference.pdf.
- Deep Equilibrium Models. Advances in Neural Information Processing Systems, 32, 2019.
- Multiscale Deep Equilibrium Models. Advances in Neural Information Processing Systems, 33:5238–5250, 2020.
- Deep equilibrium optical flow estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 620–630, 2022.
- End-to-end Algorithm Synthesis with Recurrent Networks: Extrapolation without Overthinking. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 20232–20242. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/7f70331dbe58ad59d83941dfa7d975aa-Paper-Conference.pdf.
- Ulrich Bauer. Ripser: efficient computation of Vietoris-Rips persistence barcodes. J. Appl. Comput. Topol., 5(3):391–423, 2021. ISSN 2367-1726. doi: 10.1007/s41468-021-00071-5. URL https://doi.org/10.1007/s41468-021-00071-5.
- One-step differentiation of iterative algorithms. Advances in Neural Information Processing Systems, 36, 2024.
- A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task. arXiv preprint arXiv:2402.11917, 2024.
- Topological Analysis of Recurrent Systems. In NIPS 2012 Workshop on Algebraic Topology and Machine Learning, December 8th, Lake Tahoe, Nevada, pp. 1–5, 2012.
- Implicit Deep Learning. SIAM Journal on Mathematics of Data Science, 3(3):930–958, 2021.
- Is Attention Better Than Matrix Decomposition? In International Conference on Learning Representations.
- Deep Equilibrium Architectures for Inverse Problems in Imaging. IEEE Transactions on Computational Imaging, 7:1123–1133, 2021.
- Allen Hatcher. Algebraic Topology. Cambridge University Press, 2002.
- Dictionary Learning Improves Patch-Free Circuit Discovery in Mechanistic Interpretability: A Case Study on Othello-GPT. arXiv preprint arXiv:2402.12201, 2024.
- Explainable AI via learning to optimize. Scientific Reports, 13(1):10103, 2023.
- Feasibility-based fixed point networks. Fixed Point Theory and Algorithms for Sciences and Engineering, 2021:1–19, 2021.
- YF Hendrawan. Comparison of Hand Follower and Dead-End Filler Algorithm in Solving Perfect Mazes. In Journal of Physics: Conference Series, volume 1569, pp. 022059. IOP Publishing, 2020.
- Linearly Structured World Representations in Maze-Solving Transformers. In Marco Fumero, Emanuele Rodolá, Clementine Domine, Francesco Locatello, Karolina Dziugaite, and Caron Mathilde (eds.), Proceedings of UniReps: the First Workshop on Unifying Representations in Neural Models, volume 243 of Proceedings of Machine Learning Research, pp. 133–143. PMLR, 15 Dec 2024. URL https://proceedings.mlr.press/v243/ivanitskiy24a.html.
- A Configurable Library for Generating and Manipulating Maze Datasets, 2023. URL http://arxiv.org/abs/2309.10498.
- Evidence of Learned Look-Ahead in a Chess-Playing Neural Network. arXiv preprint arXiv:2406.00877, 2024.
- Adam Karvonen. Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models. arXiv preprint arXiv:2403.15498, 2024.
- Emergent world representations: Exploring a sequence model trained on a synthetic task. arXiv preprint arXiv:2210.13382, 2022.
- Out-of-Distribution Generalization with Deep Equilibrium Models. In Workshop on Uncertainty and Robustness in Deep Learning. ICML, 2021.
- Reviving and Improving Recurrent Back-Propagation. In International Conference on Machine Learning, pp. 3082–3091. PMLR, 2018.
- Online deep equilibrium learning for regularization by denoising. Advances in Neural Information Processing Systems, 35:25363–25376, 2022.
- Acquisition of chess knowledge in AlphaZero. Proceedings of the National Academy of Sciences, 119(47):e2206625119, 2022.
- Three-Operator Splitting for Learning to Predict Equilibria in Convex Games. SIAM Journal on Mathematics of Data Science, 6(3):627–648, 2024a.
- Differentiating Through Integer Linear Programs with Quadratic Regularization and Davis-Yin Splitting. Transactions on Machine Learning Research, 2024b.
- Understanding and Controlling a Maze-Solving Policy Network. arXiv preprint arXiv:2310.08043, 2023.
- Evaluating Cognitive Maps and planning in Large Language Models with CogEval. Advances in Neural Information Processing Systems, 36, 2024.
- Elizabeth Munch. A User’s Guide to Topological Data Analysis. Journal of Learning Analytics, 4(2):47–61, 2017.
- Neel Nanda. Actually, Othello-GPT Has A Linear Emergent World Representation. Neel Nanda’s Blog, 7, 2023.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in neural information processing systems, 32, 2019.
- Sliding Windows and Persistence: An Application of Topological Methods to Signal Analysis. Foundations of Computational Mathematics, 15:799–838, 2015.
- SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models. In ICLR 2022-International Conference on Learning Representations, 2022.
- Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks. In 2023 ieee conference on secure and trustworthy machine learning (satml), pp. 464–483. IEEE, 2023.
- Detecting Out-of-Distribution Examples with Gram Matrices. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020.
- Datasets for Studying Generalization from Easy to Hard Examples. arXiv preprint arXiv:2108.06011, 2021a.
- Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks. Advances in Neural Information Processing Systems, 34:6695–6706, 2021b.
- Thinking Deeply with Recurrence: Generalizing from Easy to Hard Sequential Reasoning Problems. CoRR, 2021c.
- Floris Takens. Detecting strange attractors in turbulence. In Dynamical Systems and Turbulence, Warwick 1980: proceedings of a symposium held at the University of Warwick 1979/80, pp. 366–381. Springer, 2006.
- Ripser.py: A Lean Persistent Homology Library for Python. The Journal of Open Source Software, 3(29):925, Sep 2018. doi: 10.21105/joss.00925. URL https://doi.org/10.21105/joss.00925.
- (Quasi)Periodicity Quantification in Video Data, Using Topology. SIAM Journal on Imaging Sciences, 11(2):1049–1077, 2018.
- Sensing with shallow recurrent decoder networks. arXiv preprint arXiv:2301.12011, 2023.
- Monotone operator equilibrium networks. Advances in neural information processing systems, 33:10718–10728, 2020.
- JFB: Jacobian-Free Backpropagation for Implicit Networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 6648–6656, 2022.
- Learning to Optimize: Where Deep Learning Meets Optimization and Inverse Problems. SIAM News, 2022. URL https://www.siam.org/publications/siam-news/articles/learning-to-optimize-where-deep-learning-meets-optimization-and-inverse-problems.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.