Papers
Topics
Authors
Recent
Search
2000 character limit reached

MemoryPrompt: A Light Wrapper to Improve Context Tracking in Pre-trained Language Models

Published 23 Feb 2024 in cs.CL, cs.AI, and cs.LG | (2402.15268v1)

Abstract: Transformer-based LMs track contextual information through large, hard-coded input windows. We introduce MemoryPrompt, a leaner approach in which the LM is complemented by a small auxiliary recurrent network that passes information to the LM by prefixing its regular input with a sequence of vectors, akin to soft prompts, without requiring LM finetuning. Tested on a task designed to probe a LM's ability to keep track of multiple fact updates, a MemoryPrompt-augmented LM outperforms much larger LMs that have access to the full input history. We also test MemoryPrompt on a long-distance dialogue dataset, where its performance is comparable to that of a model conditioned on the entire conversation history. In both experiments we also observe that, unlike full-finetuning approaches, MemoryPrompt does not suffer from catastrophic forgetting when adapted to new tasks, thus not disrupting the generalist capabilities of the underlying LM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Longformer: The long-document transformer. https://arxiv.org/abs/2004.05150.
  2. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5(2):157–166.
  3. Scaling Transformer to 1M tokens and beyond with RMT. https://arxiv.org/abs/2304.11062.
  4. Recurrent memory transformer. In Proceedings of NeurIPS, pages 11079–11091, New Orleans, LA.
  5. Extending context window of large language models via positional interpolation. arXiv preprint arXiv:2306.15595.
  6. Transformer-XL: Attentive language models beyond a fixed-length context. In Proceedings of ACL, pages 2978–2988, Florence, Italy.
  7. Measuring and improving consistency in pretrained language models. Transactions of the Association for Computational Linguistics, 9:1012–1031.
  8. T-rex: A large scale alignment of natural language with knowledge base triples. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).
  9. Addressing some limitations of transformers with feedback memory. https://arxiv.org/abs/2002.09402.
  10. Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626):471–476.
  11. Sepp Hochreiter. 1998. The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 6(02):107–116.
  12. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation, 9(8):1735–1780.
  13. Block-recurrent transformers. arXiv preprint arXiv:2203.07852.
  14. Armand Joulin and Tomas Mikolov. 2015. Inferring algorithmic patterns with stack-augmented recurrent nets. In Proceedings of NIPS, Montreal, Canada. Published online: https://papers.nips.cc/book/advances-in-neural-information-processing-systems-28-2015.
  15. A large-scale corpus for conversation disentanglement. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3846–3856, Florence, Italy. Association for Computational Linguistics.
  16. The power of scale for parameter-efficient prompt tuning. In Proceedings of EMNLP, pages 3045–3059, Punta Cana, Dominican Republic.
  17. Xiang Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of ACL, pages 4582–4597, Online.
  18. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. https://arxiv.org/abs/2107.13586.
  19. Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
  20. Pointer sentinel mixture models. Proceedings of ICLR.
  21. End-to-end memory networks. http://arxiv.org/abs/1503.08895.
  22. Attention is all you need. In Proceedings of NIPS, pages 5998–6008, Long Beach, CA.
  23. Neurons in large language models: Dead, n-gram, positional. https://arxiv.org/abs/2309.04827.
  24. Memformer: A memory-augmented transformer for sequence modeling. In Findings of ACL, pages 308–318, Online.
  25. Beyond goldfish memory: Long-term open-domain conversation. arXiv preprint arXiv:2107.07567.
  26. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
  27. Factual probing is [MASK]: Learning vs. learning to recall. In Proceedings of NAACL, pages 5017–5033, Online.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 3 likes about this paper.