Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Hybrid System for Systematic Generalization in Simple Arithmetic Problems (2306.17249v1)

Published 29 Jun 2023 in cs.NE and cs.AI

Abstract: Solving symbolic reasoning problems that require compositionality and systematicity is considered one of the key ingredients of human intelligence. However, symbolic reasoning is still a great challenge for deep learning models, which often cannot generalize the reasoning pattern to out-of-distribution test cases. In this work, we propose a hybrid system capable of solving arithmetic problems that require compositional and systematic reasoning over sequences of symbols. The model acquires such a skill by learning appropriate substitution rules, which are applied iteratively to the input string until the expression is completely resolved. We show that the proposed system can accurately solve nested arithmetical expressions even when trained only on a subset including the simplest cases, significantly outperforming both a sequence-to-sequence model trained end-to-end and a state-of-the-art LLM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Compositionality decomposed: How do neural networks generalise?, J. Artif. Intell. Res. 67 (2020) 757–795. URL: https://doi.org/10.1613/jair.1.11674. doi:10.1613/jair.1.11674.
  2. Towards better out-of-distribution generalization of neural algorithmic reasoning tasks, Transactions on Machine Learning Research (2023). URL: https://openreview.net/forum?id=xkrtvHlp3P.
  3. P. Hitzler, Neuro-symbolic artificial intelligence: The state of the art (2022).
  4. A. Testolin, Can neural networks do arithmetic? a survey on the elementary numerical skills of state-of-the-art deep learning models, arXiv preprint arXiv:2303.07735 (2023).
  5. Attention is all you need, in: I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, 2017, pp. 5998–6008. URL: https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
  6. Y. Li, J. McClelland, Systematic generalization and emergent structures in transformers trained on structured tasks, in: NeurIPS ’22 Workshop on All Things Attention: Bridging Different Perspectives on Attention, 2022. URL: https://openreview.net/forum?id=BTNaKmYdQmE.
  7. Language models are few-shot learners, in: H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, H. Lin (Eds.), Advances in Neural Information Processing Systems, volume 33, Curran Associates, Inc., 2020, pp. 1877–1901. URL: https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
  8. Training language models to follow instructions with human feedback, in: A. H. Oh, A. Agarwal, D. Belgrave, K. Cho (Eds.), Advances in Neural Information Processing Systems, 2022. URL: https://openreview.net/forum?id=TG8KACxEON.
  9. Rect: A recursive transformer architecture for generalizable mathematical reasoning, in: International Workshop on Neural-Symbolic Learning and Reasoning, 2021.
  10. Recursive decoding: A situated cognition approach to compositional generation in grounded language understanding, CoRR abs/2201.11766 (2022). URL: https://arxiv.org/abs/2201.11766. arXiv:2201.11766.
  11. Self-attention with relative position representations, in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), Association for Computational Linguistics, New Orleans, Louisiana, 2018, pp. 464–468. URL: https://aclanthology.org/N18-2074. doi:10.18653/v1/N18-2074.
  12. Neural arithmetic logic units, in: S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett (Eds.), Advances in Neural Information Processing Systems, volume 31, Curran Associates, Inc., 2018. URL: https://proceedings.neurips.cc/paper_files/paper/2018/file/0e64a7b00c83e3d22ce6b3acf2c582b6-Paper.pdf.
  13. Iterative decoding for compositional generalization in transformers, CoRR abs/2110.04169 (2021). URL: https://arxiv.org/abs/2110.04169. arXiv:2110.04169.
  14. The devil is in the detail: Simple tricks improve systematic generalization of transformers, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 2021, pp. 619–634. URL: https://aclanthology.org/2021.emnlp-main.49. doi:10.18653/v1/2021.emnlp-main.49.
  15. Universal transformers, in: International Conference on Learning Representations, 2019. URL: https://openreview.net/forum?id=HyzdRiR9Y7.
  16. The neural data router: Adaptive control flow in transformers improves systematic generalization, in: International Conference on Learning Representations, 2022. URL: https://openreview.net/forum?id=KBQP4A_J1K.
  17. W.-Z. Dai, S. Muggleton, Abductive knowledge induction from raw data, in: Z.-H. Zhou (Ed.), Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, International Joint Conferences on Artificial Intelligence Organization, 2021, pp. 1845–1851. URL: https://doi.org/10.24963/ijcai.2021/254. doi:10.24963/ijcai.2021/254, main Track.
  18. Deepproblog: Neural probabilistic logic programming, in: S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett (Eds.), Advances in Neural Information Processing Systems, volume 31, Curran Associates, Inc., 2018. URL: https://proceedings.neurips.cc/paper_files/paper/2018/file/dc5d637ed5e62c36ecb73b654b05ba2a-Paper.pdf.
Citations (1)

Summary

We haven't generated a summary for this paper yet.