Nonparametric Modern Hopfield Models (2404.03900v2)
Abstract: We present a nonparametric interpretation for deep learning compatible modern Hopfield models and utilize this new perspective to debut efficient variants. Our key contribution stems from interpreting the memory storage and retrieval processes in modern Hopfield models as a nonparametric regression problem subject to a set of query-memory pairs. Interestingly, our framework not only recovers the known results from the original dense modern Hopfield model but also fills the void in the literature regarding efficient modern Hopfield models, by introducing \textit{sparse-structured} modern Hopfield models with sub-quadratic complexity. We establish that this sparse model inherits the appealing theoretical properties of its dense analogue -- connection with transformer attention, fixed point convergence and exponential memory capacity. Additionally, we showcase the versatility of our framework by constructing a family of modern Hopfield models as extensions, including linear, random masked, top-$K$ and positive random feature modern Hopfield models. Empirically, we validate our framework in both synthetic and realistic settings for memory retrieval and learning tasks.
- Conformal prediction for time series with modern hopfield networks. Advances in Neural Information Processing Systems, 36, 2024. URL https://arxiv.org/abs/2303.12783.
- Support vector regression. Efficient learning machines: Theories, concepts, and applications for engineers and system designers, pages 67–80, 2015. URL https://link.springer.com/chapter/10.1007/978-1-4302-5990-9_4.
- Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020. URL https://arxiv.org/abs/2004.05150.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021. URL https://arxiv.org/abs/2108.07258.
- Johannes Brandstetter. Blog post: Hopfield networks is all you need, 2021. URL https://ml-jku.github.io/hopfield-layers/. Accessed: April 4, 2023.
- Random point sets on the sphere-hole radii, covering, and separation. Experimental Mathematics, 27(1):62–81, 2018. URL https://arxiv.org/abs/1512.07470.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html?utm_medium=email&utm_source=transaction.
- Multiple instance learning: A survey of problem characteristics and applications. Pattern Recognition, 77:329–353, 2018.
- The infinite polynomial kernel for support vector machine. In Advanced Data Mining and Applications: First International Conference, ADMA 2005, Wuhan, China, July 22-24, 2005. Proceedings 1, pages 267–275. Springer, 2005. URL https://link.springer.com/chapter/10.1007/11527503_32.
- Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509, 2019. URL https://arxiv.org/abs/1904.10509.
- Rethinking attention with performers. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=Ua6zuk0WRH.
- Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289, 2015. URL https://arxiv.org/abs/1511.07289.
- Adaptively sparse transformers. arXiv preprint arXiv:1909.00015, 2019. URL https://arxiv.org/abs/1909.00015.
- On a model of associative memory with huge storage capacity. Journal of Statistical Physics, 168:288–299, 2017. URL https://link.springer.com/article/10.1007/s10955-017-1806-y.
- Superiority of softmax: Unveiling the performance edge over linear attention. arXiv preprint arXiv:2310.11685, 2023.
- Gpt-3: Its nature, scope, limits, and consequences. Minds and Machines, 30:681–694, 2020. URL https://link.springer.com/article/10.1007/s11023-020-09548-1.
- Cloob: Modern hopfield networks with infoloob outperform clip. Advances in neural information processing systems, 35:20450–20468, 2022. URL https://arxiv.org/abs/2110.11316.
- Memory-efficient transformers via top-k𝑘kitalic_k attention. arXiv preprint arXiv:2106.06899, 2021. URL https://arxiv.org/abs/2106.06899.
- Energy transformer. arXiv preprint arXiv:2302.07253, 2023. URL https://arxiv.org/abs/2302.07253.
- John J Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8):2554–2558, 1982. URL https://www.pnas.org/doi/abs/10.1073/pnas.79.8.2554.
- John J Hopfield. Neurons with graded response have collective computational properties like those of two-state neurons. Proceedings of the national academy of sciences, 81(10):3088–3092, 1984. URL https://www.pnas.org/doi/abs/10.1073/pnas.81.10.3088.
- On sparse modern hopfield model. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://arxiv.org/abs/2309.12673.
- Outlier-efficient hopfield layers for large transformer-based models. 2024a.
- On computational limits of modern hopfield models: A fine-grained complexity analysis. arXiv preprint arXiv:2402.04520, 2024b.
- Attention-based deep multiple instance learning. In International conference on machine learning, pages 2127–2136. PMLR, 2018. URL https://arxiv.org/abs/1802.04712.
- Martin Jaggi. An equivalence between the lasso and support vector machines. Regularization, optimization, kernels, and support vector machines, pages 1–26, 2014. URL https://arxiv.org/abs/1303.1152.
- Dnabert: pre-trained bidirectional encoder representations from transformers model for dna-language in genome. Bioinformatics, 37(15):2112–2120, 2021. URL https://academic.oup.com/bioinformatics/article/37/15/2112/6128680?login=false.
- Empowering multiple instance histopathology cancer diagnosis by cell graphs. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2014: 17th International Conference, Boston, MA, USA, September 14-18, 2014, Proceedings, Part II 17, pages 228–235. Springer, 2014. URL https://link.springer.com/chapter/10.1007/978-3-319-10470-6_29.
- Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning, pages 5156–5165. PMLR, 2020. URL https://proceedings.mlr.press/v119/katharopoulos20a.html.
- On the computational complexity of self-attention. In International Conference on Algorithmic Learning Theory, pages 597–619. PMLR, 2023.
- Building transformers from neurons and astrocytes. bioRxiv, pages 2022–10, 2022. URL https://www.pnas.org/doi/abs/10.1073/pnas.2219150120.
- Dense associative memory for pattern recognition. Advances in neural information processing systems, 29, 2016. URL https://proceedings.neurips.cc/paper_files/paper/2016/hash/eaae339c4d89fc102edd9dbdb6a28915-Abstract.html.
- Large associative memory problem in neurobiology and machine learning. In International Conference on Learning Representations, 2021. URL https://arxiv.org/abs/2008.06996.
- From softmax to sparsemax: A sparse model of attention and multi-label classification. In International conference on machine learning, pages 1614–1623. PMLR, 2016. URL https://arxiv.org/abs/1602.02068.
- Sparse modern hopfield networks. Associative Memory & Hopfield Networks in 2023. NeurIPS 2023 workshop., 2023. URL https://openreview.net/pdf?id=zwqlV7HoaT.
- NIST handbook of mathematical functions hardback and CD-ROM. Cambridge university press, 2010. URL https://dlmf.nist.gov/.
- History compression via language models in reinforcement learning. In International Conference on Machine Learning, pages 17156–17185. PMLR, 2022. URL https://proceedings.mlr.press/v162/paischer22a.html.
- Blockwise self-attention for long document understanding. arXiv preprint arXiv:1911.02972, 2019. URL https://arxiv.org/abs/1911.02972.
- Hopfield networks is all you need. arXiv preprint arXiv:2008.02217, 2020. URL https://arxiv.org/abs/2008.02217.
- Context-enriched molecule representations improve few-shot drug discovery. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=XrMWUuEevr.
- Improving few-and zero-shot reaction template prediction using modern hopfield networks. Journal of chemical information and modeling, 62(9):2111–2120, 2022. URL https://pubs.acs.org/doi/full/10.1021/acs.jcim.1c01065.
- On the convergence of the concave-convex procedure. In Advances in neural information processing systems, volume 9, pages 1759–1767, 2009. URL https://papers.nips.cc/paper_files/paper/2009/file/8b5040a8a5baf3e0e67386c2e3a9b903-Paper.pdf.
- Efficient transformers: A survey. ACM Computing Surveys, 55(6):1–28, 2022. URL https://dl.acm.org/doi/10.1145/3530811.
- Attention is all you need. Advances in neural information processing systems, 30, 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
- Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019.
- Modern hopfield networks and attention for immune repertoire classification. Advances in Neural Information Processing Systems, 33:18832–18845, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/da4902cb0bc38210839714ebdcf0efc3-Abstract.html.
- Uniform memory retrieval with larger capacity for modern hopfield models. 2024a.
- STanhop: Sparse tandem hopfield model for memory-enhanced time series prediction. In The Twelfth International Conference on Learning Representations, 2024b. URL https://arxiv.org/abs/2312.17346.
- Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564, 2023. URL https://arxiv.org/abs/2303.17564.
- Bishop: Bi-directional cellular learning for tabular data with generalized sparse modern hopfield model. 2024.
- Big bird: Transformers for longer sequences. Advances in neural information processing systems, 33:17283–17297, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/c8512d142a2d849725f31a9a7a361ab9-Abstract.html.
- DNABERT-2: Efficient foundation model and benchmark for multi-species genomes. In The Twelfth International Conference on Learning Representations, 2024a. URL https://openreview.net/forum?id=oMLQB4EZE1.
- Dnabert-s: Learning species-aware dna embedding with genome foundation models. arXiv preprint arXiv:2402.08777, 2024b. URL https://arxiv.org/abs/2402.08777.