2000 character limit reached
Sparse and Structured Hopfield Networks (2402.13725v2)
Published 21 Feb 2024 in cs.LG
Abstract: Modern Hopfield networks have enjoyed recent interest due to their connection to attention in transformers. Our paper provides a unified framework for sparse Hopfield networks by establishing a link with Fenchel-Young losses. The result is a new family of Hopfield-Fenchel-Young energies whose update rules are end-to-end differentiable sparse transformations. We reveal a connection between loss margins, sparsity, and exact memory retrieval. We further extend this framework to structured Hopfield networks via the SparseMAP transformation, which can retrieve pattern associations instead of a single pattern. Experiments on multiple instance learning and text rationalization demonstrate the usefulness of our approach.
- Storing infinite numbers of patterns in a spin-glass model of neural networks. Physical Review Letters, 55(14):1530, 1985.
- Optnet: Differentiable optimization as a layer in neural networks. In International Conference on Machine Learning, pp. 136–145. PMLR, 2017.
- Interpretable neural predictions with differentiable binary variables. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2963–2977, 2019.
- Learning with Fenchel-Young losses. Journal of Machine Learning Research, 21(1):1314–1382, 2020.
- Random point sets on the sphere—hole radii, covering, and separation. Experimental Mathematics, 27(1):62–81, 2018.
- Adaptively sparse transformers. In Proceedings of EMNLP-IJCNLP, 2019.
- Retrieving k𝑘kitalic_k-nearest memories with modern hopfield networks. In Associative Memory {normal-{\{{\normal-\\backslash\&}normal-}\}} Hopfield Networks in 2023, 2023.
- On a model of associative memory with huge storage capacity. Journal of Statistical Physics, 168:288–299, 2017.
- Spectra: Sparse structured text rationalization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 6534–6550, 2021.
- Energy transformer. In Advances in Neural Information Processing Systems, 2023.
- Hopfield, J. J. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79(8):2554–2558, 1982.
- On sparse modern hopfield model. In Advances in Neural Information Processing Systems, 2023. URL https://arxiv.org/abs/2309.12673.
- Attention-based deep multiple instance learning. In International Conference on Machine Learning, pp. 2127–2136. PMLR, 2018. URL https://arxiv.org/abs/1802.04712.
- Aligning faithful interpretations with their social attribution. Transactions of the Association for Computational Linguistics, 9:294–310, 2021.
- Dense associative memory for pattern recognition. In Advances in Neural Information Processing Systems, 2016.
- Factor graphs and the sum-product algorithm. IEEE Transactions on information theory, 47(2):498–519, 2001.
- Rationalizing neural predictions. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 107–117, 2016.
- Pattern separation in the dentate gyrus and CA3 of the hippocampus. Science, 315(5814):961–966, 2007.
- An exponential learning rate schedule for deep learning. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=rJg8TeSFDH.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- From softmax to sparsemax: A sparse model of attention and multi-label classification. In Proceedings of ICML, 2016.
- May, R. M. Patterns of species abundance and distribution. Ecology and Evolution of Communities, pp. 81–120, 1975.
- Learning attitudes and attributes from multi-aspect reviews. In ICDM, 2012.
- The capacity of the hopfield associative memory. IEEE transactions on Information Theory, 33(4):461–482, 1987.
- Universal hopfield networks: A general framework for single-shot associative memory models. In International Conference on Machine Learning, pp. 15561–15583. PMLR, 2022.
- CA3 retrieves coherent representations from degraded input: direct evidence for CA3 pattern completion and dentate gyrus pattern separation. Neuron, 81(2):416–427, 2014.
- A regularized framework for sparse and structured neural attention. Advances in Neural Information Processing Systems, 30, 2017.
- Sparsemap: Differentiable sparse structured inference. In Proceedings of the International Conference on Machine Learning (ICML), 2018.
- Palm, G. Neural associative memories and sparse coding. Neural Netw, 37:165–171, 2013.
- Sparse sequence-to-sequence models. In Proceedings of ACL, 2019.
- Hopfield networks is all you need. In Proceedings of ICLR, 2021. URL https://openreview.net/forum?id=tL89RnzIiCd.
- A combinatorial model for dentate gyrus sparse coding. Neural Computation, 29(1):94–117, 2017.
- Natural image statistics and neural representation. Annual Review of Neuroscience, 24(1):1193–1216, 2001.
- Max-margin markov networks. In Advances in Neural Information Processing Systems, 2003.
- Tsallis, C. Possible generalization of boltzmann-gibbs statistics. Journal of Statistical Physics, 52:479–487, 1988.
- Schemas and memory consolidation. Science, 316(5821):76–82, 2007. doi: 10.1126/science.1135935.
- Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6(9), 2005.
- Biological learning in key-value memory networks. In Advances in Neural Information Processing Systems, 2021.
- Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2):1–305, 2008.
- Relating transformers to models and neural representations of the hippocampal formation. In Proceedings of ICLR, 2021.
- STanhop: Sparse tandem hopfield model for memory-enhanced time series prediction. In Proceedings of ICLR, 2024. URL https://openreview.net/forum?id=6iwg437CZs.
- Pattern separation in the hippocampus. Trends Neurosci, 34(10):515–525, 2011.
- The concave-convex procedure. Neural computation, 15(4):915–936, 2003.
- Explain and predict, and then predict again. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 2021.