Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Memoria: Resolving Fateful Forgetting Problem through Human-Inspired Memory Architecture (2310.03052v3)

Published 4 Oct 2023 in cs.LG, cs.AI, and cs.NE

Abstract: Making neural networks remember over the long term has been a longstanding issue. Although several external memory techniques have been introduced, most focus on retaining recent information in the short term. Regardless of its importance, information tends to be fatefully forgotten over time. We present Memoria, a memory system for artificial neural networks, drawing inspiration from humans and applying various neuroscientific and psychological theories. The experimental results prove the effectiveness of Memoria in the diverse tasks of sorting, LLMing, and classification, surpassing conventional techniques. Engram analysis reveals that Memoria exhibits the primacy, recency, and temporal contiguity effects which are characteristics of human memory.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Human memory: A proposed system and its control processes11this research was supported by the national aeronautics and space administration, grant no. ngr-05-020-036. the authors are indebted to w. k. estes and g. h. bower who provided many valuable suggestions and comments at various stages of the work. special credit is due j. w. brelsford who was instrumental in carrying out the research discussed in section iv and whose overall contributions are too numerous to report in detail. we should also like to thank those co-workers who carried out a number of the experiments discussed in the latter half of the paper; rather than list them here, each will be acknowledged at the appropriate place. volume 2 of Psychology of Learning and Motivation, pp. 89–195. Academic Press, 1968a. doi: https://doi.org/10.1016/S0079-7421(08)60422-3. URL https://www.sciencedirect.com/science/article/pii/S0079742108604223.
  2. Human memory: A proposed system and its control processes. volume 2 of Psychology of Learning and Motivation, pp. 89–195. Academic Press, 1968b. doi: https://doi.org/10.1016/S0079-7421(08)60422-3. URL https://www.sciencedirect.com/science/article/pii/S0079742108604223.
  3. Longformer: The long-document transformer. CoRR, abs/2004.05150, 2020. URL https://arxiv.org/abs/2004.05150.
  4. John Brown. Some tests of the decay theory of immediate memory. Quarterly Journal of Experimental Psychology, 10(1):12–21, 1958. doi: 10.1080/17470215808416249. URL https://doi.org/10.1080/17470215808416249.
  5. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  1877–1901. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
  6. Recurrent memory transformer. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  11079–11091. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/47e288629a6996a17ce50b90a056a0e1-Paper-Conference.pdf.
  7. Memory transformer. CoRR, abs/2006.11527, 2020. URL https://arxiv.org/abs/2006.11527.
  8. Spike timing-dependent plasticity: a hebbian learning rule. Annu Rev Neurosci, 31:25–46, 2008.
  9. Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, abs/1412.3555, 2014. URL http://arxiv.org/abs/1412.3555.
  10. Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11(6):671–684, 1972. ISSN 0022-5371. doi: https://doi.org/10.1016/S0022-5371(72)80001-X. URL https://www.sciencedirect.com/science/article/pii/S002253717280001X.
  11. Transformer-xl: Attentive language models beyond a fixed-length context. ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pp.  2978–2988, 1 2019. doi: 10.48550/arxiv.1901.02860. URL https://arxiv.org/abs/1901.02860v3.
  12. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp.  4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
  13. Mathematical formulations of hebbian learning. Biological Cybernetics, 87(5):404–415, 2002. doi: 10.1007/s00422-002-0353-y. URL https://doi.org/10.1007/s00422-002-0353-y.
  14. Neural turing machines. CoRR, abs/1410.5401, 2014. URL http://arxiv.org/abs/1410.5401.
  15. Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626):471–476, October 2016. ISSN 00280836. URL http://dx.doi.org/10.1038/nature20101.
  16. D O Hebb. The organization of behavior. 1949.
  17. Long short-term memory. Neural Computation, 9:1735–1780, 11 1997. ISSN 08997667. doi: 10.1162/NECO.1997.9.8.1735. URL https://www.researchgate.net/publication/13853244_Long_Short-term_Memory.
  18. Hebbian deep learning without feedback. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=8gd4M-_Rj1.
  19. SemEval-2019 task 4: Hyperpartisan news detection. In Proceedings of the 13th International Workshop on Semantic Evaluation, pp.  829–839, Minneapolis, Minnesota, USA, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/S19-2145. URL https://aclanthology.org/S19-2145.
  20. Reformer: The efficient transformer. CoRR, abs/2001.04451, 2020. URL https://arxiv.org/abs/2001.04451.
  21. Biological context of hebb learning in artificial neural networks, a review. Neurocomputing, 152:27–35, 2015. ISSN 0925-2312. doi: https://doi.org/10.1016/j.neucom.2014.11.022. URL https://www.sciencedirect.com/science/article/pii/S0925231214015239.
  22. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.  7871–7880, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.703. URL https://aclanthology.org/2020.acl-main.703.
  23. Matt Mahoney. Large text compression benchmark, 2006. URL http://www.mattmahoney.net/dc/text.html.
  24. ∞\infty∞-former: Infinite memory transformer. 9 2021. doi: 10.48550/arxiv.2109.00301. URL https://arxiv.org/abs/2109.00301v3.
  25. Pointer sentinel mixture models. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017. URL https://openreview.net/forum?id=Byj72udxe.
  26. Javier R. Movellan. Contrastive hebbian learning in the continuous hopfield model. In David S. Touretzky, Jeffrey L. Elman, Terrence J. Sejnowski, and Geoffrey E. Hinton (eds.), Connectionist Models, pp.  10–17. Morgan Kaufmann, 1991. ISBN 978-1-4832-1448-1. doi: https://doi.org/10.1016/B978-1-4832-1448-1.50007-X. URL https://www.sciencedirect.com/science/article/pii/B978148321448150007X.
  27. Adaptive memory: Remembering with a stone-age brain. Current Directions in Psychological Science, 17(4):239–243, 2008. doi: 10.1111/j.1467-8721.2008.00582.x. URL https://doi.org/10.1111/j.1467-8721.2008.00582.x.
  28. Short-term retention of individual verbal items. J Exp Psychol, 58:193–198, September 1959.
  29. Improving language understanding by generative pre-training. 2018. URL https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf.
  30. Scaling memory-augmented neural networks with sparse reads and writes. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. URL https://proceedings.neurips.cc/paper_files/paper/2016/file/3fab5890d8113d0b5a4178201dc842ad-Paper.pdf.
  31. Compressive transformers for long-range sequence modelling. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=SylKikSYDH.
  32. Learning Internal Representations by Error Propagation, pp. 318–362. 1987.
  33. Competitive hebbian learning through spike-timing-dependent synaptic plasticity. Nature Neuroscience, 3(9):919–926, 2000. doi: 10.1038/78829. URL https://doi.org/10.1038/78829.
  34. B J Underwood and L Postman. Extraexperimental sources of interference in forgeting. Psychol Rev, 67:73–95, March 1960.
  35. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
  36. Primary memory. Psychological Review, 72(2):89–104, 1965. doi: 10.1037/h0021797.
  37. Memorizing transformers. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=TrjbxzRcnf-.
  38. Big bird: Transformers for longer sequences. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  17283–17297. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/c8512d142a2d849725f31a9a7a361ab9-Paper.pdf.
Citations (2)

Summary

  • The paper introduces Memoria, a novel memory network that integrates Hebbian theory to overcome Transformers' limitations in handling long input sequences.
  • It employs a multi-level memory system—working, short-term, and long-term memory—to selectively store and recall information more effectively.
  • Experimental results show enhanced accuracy and efficiency in tasks like language modeling and text classification, bridging cognitive theory and AI.

Memoria: Enhancing Transformers with a Hebbian Memory Architecture

Introduction to Memoria

Recent research introduces Memoria, a novel memory network embedding Hebbian theory principles—a foundational concept in explaining human memory dynamics—into neural network architectures. This advancement targets the inherent limitations of Transformers, particularly their struggle with processing long input sequences due to computational complexity and their inability to selectively remember informative parts of the input. Memoria innovatively integrates a multi-level memory system, categorizing information into working, short-term, and long-term memory, enabling more human-like sequential data processing by neural networks.

Hebbian Theory and Memory Processing

The Hebbian theory, proposing the idea that neurons that fire together wire together, serves as the theoretical backbone of Memoria. This concept translates into a computational model where associations between pieces of information (engrams) strengthen with repeated simultaneous activation. Memoria mirrors this process, adjusting the connection weights between engrams across multiple memory levels, thereby fostering robust and stable memory formation akin to human memory processes.

Enhancements Over Existing Models

Memoria's proposition is grounded in its ability to outperform current Transformer-based methodologies in tasks requiring attention to long-term dependencies, such as sorting, LLMing, and long-text classification. The paper presents compelling evidence where Memoria shows significant improvements over existing models, achieving higher accuracy and efficiency in processing long-sequence data. By adopting Hebbian learning principles, Memoria not only addresses the computational constraints of conventional Transformers but also introduces a more nuanced approach to data retention and recall.

Practical Implications and Theoretical Contributions

The introduction of Memoria brings forth both practical and theoretical advancements. Practically, its integration with popular Transformer models like BERT and GPT extends the applicability of these models to tasks previously constrained by sequence length limitations. Theoretically, Memoria enriches the understanding of applying cognitive theories, such as Hebbian theory, within machine learning frameworks, bridging a gap between artificial intelligence and neural science. This dual contribution potentiates further exploration into memory-augmented neural networks, opening pathways for more sophisticated and human-like AI systems.

Future Perspectives in AI Research

Looking forward, the development of Memoria proposes several avenues for future research. One promising direction is exploring the continuous structure of memory, inspired by the Levels of Processing theory, to develop models that mimic the human memory system even more closely. Additionally, incorporating forgetting mechanisms, such as the interference theory, could enhance the realism and efficiency of these models. As AI continues to evolve, integrating such nuanced human-like processes into machine learning models will be paramount in achieving true artificial general intelligence.

Conclusion

Memoria marks a significant step forward in the quest to imbue neural networks with human-like memory processing capabilities. By leveraging Hebbian theory to implement a hierarchical memory system, this research not only overcomes existing limitations of Transformer models but also paves the way for future advancements in AI. The practical achievements of Memoria, coupled with its theoretical implications, underscore the potential of cognitive principles in enhancing machine learning algorithms, heralding a new era of AI research that closely mirrors human cognition.

Github Logo Streamline Icon: https://streamlinehq.com