Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory (2404.11870v1)

Published 18 Apr 2024 in cs.LG and cs.CL

Abstract: We propose Pointer-Augmented Neural Memory (PANM) to help neural networks understand and apply symbol processing to new, longer sequences of data. PANM integrates an external neural memory that uses novel physical addresses and pointer manipulation techniques to mimic human and computer symbol processing abilities. PANM facilitates pointer assignment, dereference, and arithmetic by explicitly using physical pointers to access memory content. Remarkably, it can learn to perform these operations through end-to-end training on sequence data, powering various sequential models. Our experiments demonstrate PANM's exceptional length extrapolating capabilities and improved performance in tasks that require symbol processing, such as algorithmic reasoning and Dyck language recognition. PANM helps Transformer achieve up to 100% generalization accuracy in compositional learning tasks and significantly better results in mathematical reasoning, question answering and machine translation tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Jacob Andreas. Good-enough compositional data augmentation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7556–7566, 2020.
  2. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
  3. Systematic generalization: What is required and can it be learned? In International Conference on Learning Representations, 2018.
  4. Compositional generalization via neural-symbolic stack machines. Advances in Neural Information Processing Systems, 33:1690–1701, 2020.
  5. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
  6. The devil is in the detail: Simple tricks improve systematic generalization of transformers. arXiv preprint arXiv:2108.12284, 2021.
  7. Transformer-xl: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2978–2988, 2019.
  8. Universal transformers. In International Conference on Learning Representations, 2018.
  9. Neural networks and the chomsky hierarchy. arXiv preprint arXiv:2207.02098, 2022.
  10. Location attention for extrapolation to longer sequences. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 403–413, 2020.
  11. Generalization in multimodal language learning from simulation. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2021.
  12. Systematic generalization on gscan with language conditioned embedding. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pages 491–503, 2020.
  13. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.
  14. Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626):471–476, 2016.
  15. Learning to transduce with unbounded memory. Advances in neural information processing systems, 28, 2015.
  16. Context-free transductions with neural stacks. EMNLP 2018, page 306, 2018.
  17. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  18. A systematic assessment of syntactic generalization in neural language models. arXiv preprint arXiv:2005.03692, 2020.
  19. Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186, 2019.
  20. Deepprocess: supporting business process execution using a mann-based recommender system. In Service-Oriented Computing: 19th International Conference, ICSOC 2021, Virtual Event, November 22–25, 2021, Proceedings 19, pages 19–33. Springer, 2021.
  21. Indirection and symbol-like processing in the prefrontal cortex and basal ganglia. Proceedings of the National Academy of Sciences, 110:16390 – 16395, 2013.
  22. Neural random-access machines. arXiv preprint arXiv:1511.06392, 2015.
  23. Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In International conference on machine learning, pages 2873–2882. PMLR, 2018.
  24. Brenden M Lake. Compositional generalization through meta sequence-to-sequence learning. Advances in neural information processing systems, 32, 2019.
  25. Neurocoder: General-purpose computation using stored neural programs. In International Conference on Machine Learning, pages 12204–12221. PMLR, 2022.
  26. Dual memory neural computer for asynchronous two-view sequential learning. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1637–1645, 2018.
  27. Neural stored-program memory. In International Conference on Learning Representations, 2020a. URL https://openreview.net/forum?id=rkxxA24FDr.
  28. Self-attentive associative memory. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 5682–5691, Virtual, 13–18 Jul 2020b. PMLR.
  29. Systematic generalization and emergent structures in transformers trained on structured tasks. In NeurIPS ’22 Workshop on All Things Attention: Bridging Different Perspectives on Attention, 2022. URL https://openreview.net/forum?id=BTNaKmYdQmE.
  30. Compositional generalization by learning analytical expressions. Advances in Neural Information Processing Systems, 33:11416–11427, 2020.
  31. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025, 2015.
  32. The eos decision and length extrapolation. In BlackBoxNLP@EMNLP, 2020. URL https://nlp.stanford.edu/pubs/newman2020extrapolation.pdf.
  33. Learning compositional rules via neural program synthesis. Advances in Neural Information Processing Systems, 33:10832–10842, 2020.
  34. Limitations of language models in arithmetic and symbolic induction. In Anna Rogers, Jordan L. Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 9285–9298. Association for Computational Linguistics, 2023. doi: 10.18653/v1/2023.acl-long.516. URL https://doi.org/10.18653/v1/2023.acl-long.516.
  35. Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, 2016.
  36. Compositional generalization in a deep seq2seq model by separating syntax and semantics. arXiv preprint arXiv:1904.09708, 2019.
  37. Analysing mathematical reasoning abilities of neural models. In International Conference on Learning Representations, 2018.
  38. Compositional generalization and natural language variation: Can a semantic parsing approach handle both? In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 922–938, 2021.
  39. Ray J Solomonoff. Algorithmic probability, heuristic programming and agi. In 3d Conference on Artificial General Intelligence (AGI-2010), pages 57–63. Atlantis Press, 2010.
  40. Memory-augmented recurrent neural networks can learn generalized dyck languages. arXiv preprint arXiv:1911.03329, 2019.
  41. Large language models are in-context semantic reasoners rather than symbolic reasoners. arXiv preprint arXiv:2305.14825, 2023.
  42. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  43. Pointer networks. Advances in neural information processing systems, 28, 2015.
  44. John Von Neumann. First draft of a report on the edvac. IEEE Annals of the History of Computing, 15(4):27–75, 1993.
  45. Emergent symbols through binding in external memory. In International Conference on Learning Representations, 2020.
  46. Towards ai-complete question answering: A set of prerequisite toy tasks. arXiv preprint arXiv:1502.05698, 2015.
  47. Analogical reasoning for visually grounded language acquisition. arXiv preprint arXiv:2007.11668, 2020.
  48. Greg Yang. Lie access neural turing machine. arXiv preprint arXiv:1602.08671, 2016.
  49. Learning the dyck language with attention-based seq2seq models. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 138–146, 2019.
  50. Improving deep transformer with depth-scaled initialization and merged attention. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 898–909, 2019.

Summary

We haven't generated a summary for this paper yet.