Memoria: Resolving Fateful Forgetting Problem through Human-Inspired Memory Architecture (2310.03052v3)
Abstract: Making neural networks remember over the long term has been a longstanding issue. Although several external memory techniques have been introduced, most focus on retaining recent information in the short term. Regardless of its importance, information tends to be fatefully forgotten over time. We present Memoria, a memory system for artificial neural networks, drawing inspiration from humans and applying various neuroscientific and psychological theories. The experimental results prove the effectiveness of Memoria in the diverse tasks of sorting, LLMing, and classification, surpassing conventional techniques. Engram analysis reveals that Memoria exhibits the primacy, recency, and temporal contiguity effects which are characteristics of human memory.
- Human memory: A proposed system and its control processes11this research was supported by the national aeronautics and space administration, grant no. ngr-05-020-036. the authors are indebted to w. k. estes and g. h. bower who provided many valuable suggestions and comments at various stages of the work. special credit is due j. w. brelsford who was instrumental in carrying out the research discussed in section iv and whose overall contributions are too numerous to report in detail. we should also like to thank those co-workers who carried out a number of the experiments discussed in the latter half of the paper; rather than list them here, each will be acknowledged at the appropriate place. volume 2 of Psychology of Learning and Motivation, pp. 89–195. Academic Press, 1968a. doi: https://doi.org/10.1016/S0079-7421(08)60422-3. URL https://www.sciencedirect.com/science/article/pii/S0079742108604223.
- Human memory: A proposed system and its control processes. volume 2 of Psychology of Learning and Motivation, pp. 89–195. Academic Press, 1968b. doi: https://doi.org/10.1016/S0079-7421(08)60422-3. URL https://www.sciencedirect.com/science/article/pii/S0079742108604223.
- Longformer: The long-document transformer. CoRR, abs/2004.05150, 2020. URL https://arxiv.org/abs/2004.05150.
- John Brown. Some tests of the decay theory of immediate memory. Quarterly Journal of Experimental Psychology, 10(1):12–21, 1958. doi: 10.1080/17470215808416249. URL https://doi.org/10.1080/17470215808416249.
- Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 1877–1901. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
- Recurrent memory transformer. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 11079–11091. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/47e288629a6996a17ce50b90a056a0e1-Paper-Conference.pdf.
- Memory transformer. CoRR, abs/2006.11527, 2020. URL https://arxiv.org/abs/2006.11527.
- Spike timing-dependent plasticity: a hebbian learning rule. Annu Rev Neurosci, 31:25–46, 2008.
- Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, abs/1412.3555, 2014. URL http://arxiv.org/abs/1412.3555.
- Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11(6):671–684, 1972. ISSN 0022-5371. doi: https://doi.org/10.1016/S0022-5371(72)80001-X. URL https://www.sciencedirect.com/science/article/pii/S002253717280001X.
- Transformer-xl: Attentive language models beyond a fixed-length context. ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pp. 2978–2988, 1 2019. doi: 10.48550/arxiv.1901.02860. URL https://arxiv.org/abs/1901.02860v3.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
- Mathematical formulations of hebbian learning. Biological Cybernetics, 87(5):404–415, 2002. doi: 10.1007/s00422-002-0353-y. URL https://doi.org/10.1007/s00422-002-0353-y.
- Neural turing machines. CoRR, abs/1410.5401, 2014. URL http://arxiv.org/abs/1410.5401.
- Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626):471–476, October 2016. ISSN 00280836. URL http://dx.doi.org/10.1038/nature20101.
- D O Hebb. The organization of behavior. 1949.
- Long short-term memory. Neural Computation, 9:1735–1780, 11 1997. ISSN 08997667. doi: 10.1162/NECO.1997.9.8.1735. URL https://www.researchgate.net/publication/13853244_Long_Short-term_Memory.
- Hebbian deep learning without feedback. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=8gd4M-_Rj1.
- SemEval-2019 task 4: Hyperpartisan news detection. In Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 829–839, Minneapolis, Minnesota, USA, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/S19-2145. URL https://aclanthology.org/S19-2145.
- Reformer: The efficient transformer. CoRR, abs/2001.04451, 2020. URL https://arxiv.org/abs/2001.04451.
- Biological context of hebb learning in artificial neural networks, a review. Neurocomputing, 152:27–35, 2015. ISSN 0925-2312. doi: https://doi.org/10.1016/j.neucom.2014.11.022. URL https://www.sciencedirect.com/science/article/pii/S0925231214015239.
- BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871–7880, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.703. URL https://aclanthology.org/2020.acl-main.703.
- Matt Mahoney. Large text compression benchmark, 2006. URL http://www.mattmahoney.net/dc/text.html.
- ∞\infty∞-former: Infinite memory transformer. 9 2021. doi: 10.48550/arxiv.2109.00301. URL https://arxiv.org/abs/2109.00301v3.
- Pointer sentinel mixture models. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017. URL https://openreview.net/forum?id=Byj72udxe.
- Javier R. Movellan. Contrastive hebbian learning in the continuous hopfield model. In David S. Touretzky, Jeffrey L. Elman, Terrence J. Sejnowski, and Geoffrey E. Hinton (eds.), Connectionist Models, pp. 10–17. Morgan Kaufmann, 1991. ISBN 978-1-4832-1448-1. doi: https://doi.org/10.1016/B978-1-4832-1448-1.50007-X. URL https://www.sciencedirect.com/science/article/pii/B978148321448150007X.
- Adaptive memory: Remembering with a stone-age brain. Current Directions in Psychological Science, 17(4):239–243, 2008. doi: 10.1111/j.1467-8721.2008.00582.x. URL https://doi.org/10.1111/j.1467-8721.2008.00582.x.
- Short-term retention of individual verbal items. J Exp Psychol, 58:193–198, September 1959.
- Improving language understanding by generative pre-training. 2018. URL https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf.
- Scaling memory-augmented neural networks with sparse reads and writes. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. URL https://proceedings.neurips.cc/paper_files/paper/2016/file/3fab5890d8113d0b5a4178201dc842ad-Paper.pdf.
- Compressive transformers for long-range sequence modelling. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=SylKikSYDH.
- Learning Internal Representations by Error Propagation, pp. 318–362. 1987.
- Competitive hebbian learning through spike-timing-dependent synaptic plasticity. Nature Neuroscience, 3(9):919–926, 2000. doi: 10.1038/78829. URL https://doi.org/10.1038/78829.
- B J Underwood and L Postman. Extraexperimental sources of interference in forgeting. Psychol Rev, 67:73–95, March 1960.
- Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
- Primary memory. Psychological Review, 72(2):89–104, 1965. doi: 10.1037/h0021797.
- Memorizing transformers. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=TrjbxzRcnf-.
- Big bird: Transformers for longer sequences. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 17283–17297. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/c8512d142a2d849725f31a9a7a361ab9-Paper.pdf.