Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Computational Limits of Modern Hopfield Models: A Fine-Grained Complexity Analysis (2402.04520v5)

Published 7 Feb 2024 in cs.LG, cs.AI, and stat.ML

Abstract: We investigate the computational limits of the memory retrieval dynamics of modern Hopfield models from the fine-grained complexity analysis. Our key contribution is the characterization of a phase transition behavior in the efficiency of all possible modern Hopfield models based on the norm of patterns. Specifically, we establish an upper bound criterion for the norm of input query patterns and memory patterns. Only below this criterion, sub-quadratic (efficient) variants of the modern Hopfield model exist, assuming the Strong Exponential Time Hypothesis (SETH). To showcase our theory, we provide a formal example of efficient constructions of modern Hopfield models using low-rank approximation when the efficient criterion holds. This includes a derivation of a lower bound on the computational time, scaling linearly with $\max{$# of stored memory patterns, length of input query sequence$}$. In addition, we prove its memory retrieval error bound and exponential memory capacity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Optimal-degree polynomial approximations for exponentials and gaussian kernel density estimation. In Proceedings of the 37th Computational Complexity Conference, CCC ’22, Dagstuhl, DEU, 2022. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. ISBN 9783959772419. doi: 10.4230/LIPIcs.CCC.2022.22. URL https://doi.org/10.4230/LIPIcs.CCC.2022.22.
  2. Fast attention requires bounded entries. In NeurIPS, 2023.
  3. How to capture higher-order correlations? generalizing matrix softmax attention to kronecker computation. In ICLR. arXiv preprint arXiv:2310.04064, 2024.
  4. Algorithms and hardness for linear algebra on geometric graphs. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 541–552. IEEE, 2020.
  5. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM (JACM), 45(6):891–923, 1998.
  6. Conformal prediction for time series with modern hopfield networks. arXiv preprint arXiv:2303.12783, 2023.
  7. On the fine-grained complexity of empirical risk minimization: Kernel methods and neural networks. Advances in Neural Information Processing Systems, 30, 2017.
  8. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  9. Johannes Brandstetter. Blog post: Hopfield networks is all you need, 2021. URL https://ml-jku.github.io/hopfield-layers/. Accessed: April 4, 2023.
  10. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  11. On problems as hard as cnf-sat. ACM Transactions on Algorithms (TALG), 12(3):1–24, 2016.
  12. Erik Demaine. Algorithmic lower bounds: Fun with hardness proofs, 2014.
  13. On a model of associative memory with huge storage capacity. Journal of Statistical Physics, 168:288–299, 2017.
  14. Gpt-3: Its nature, scope, limits, and consequences. Minds and Machines, 30:681–694, 2020.
  15. Cloob: Modern hopfield networks with infoloob outperform clip. Advances in neural information processing systems, 35:20450–20468, 2022. URL https://arxiv.org/abs/2110.11316.
  16. Energy transformer. arXiv preprint arXiv:2302.07253, 2023. URL https://arxiv.org/abs/2302.07253.
  17. John J Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8):2554–2558, 1982.
  18. John J Hopfield. Neurons with graded response have collective computational properties like those of two-state neurons. Proceedings of the national academy of sciences, 81(10):3088–3092, 1984.
  19. On sparse modern hopfield model, 2023. URL https://arxiv.org/abs/2309.12673.
  20. On the complexity of k-sat. Journal of Computer and System Sciences, 62(2):367–375, 2001.
  21. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 604–613, 1998.
  22. Dnabert: pre-trained bidirectional encoder representations from transformers model for dna-language in genome. Bioinformatics, 37(15):2112–2120, 2021.
  23. Building transformers from neurons and astrocytes. bioRxiv, pages 2022–10, 2022.
  24. Dense associative memory for pattern recognition. Advances in neural information processing systems, 29, 2016.
  25. Large associative memory problem in neurobiology and machine learning. In International Conference on Learning Representations, 2021. URL https://arxiv.org/abs/2008.06996.
  26. Approximate nearest neighbor search on high dimensional data-experiments, analyses, and improvement. IEEE Transactions on Knowledge and Data Engineering, 32(8):1475–1488, 2019.
  27. Foundation models for generalist medical artificial intelligence. Nature, 616(7956):259–265, 2023.
  28. Scalable nearest neighbor algorithms for high dimensional data. IEEE transactions on pattern analysis and machine intelligence, 36(11):2227–2240, 2014.
  29. NIST handbook of mathematical functions hardback and CD-ROM. Cambridge university press, 2010.
  30. History compression via language models in reinforcement learning. In International Conference on Machine Learning, pages 17156–17185. PMLR, 2022.
  31. Hopfield networks is all you need. arXiv preprint arXiv:2008.02217, 2020.
  32. Aviad Rubinstein. Hardness of approximate nearest neighbor search. In Proceedings of the 50th annual ACM SIGACT symposium on theory of computing (STOC), pages 1260–1268, 2018.
  33. Context-enriched molecule representations improve few-shot drug discovery. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=XrMWUuEevr.
  34. Improving few-and zero-shot reaction template prediction using modern hopfield networks. Journal of chemical information and modeling, 62(9):2111–2120, 2022.
  35. Large language models encode clinical knowledge. Nature, 620(7972):172–180, 2023.
  36. Large language models in medicine. Nature medicine, 29(8):1930–1940, 2023.
  37. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  38. Modern hopfield networks and attention for immune repertoire classification. Advances in Neural Information Processing Systems, 33:18832–18845, 2020.
  39. Virginia Vassilevska Williams. On some fine-grained questions in algorithms and complexity. In Proceedings of the international congress of mathematicians: Rio de janeiro 2018, pages 3447–3487. World Scientific, 2018.
  40. Stanhop: Sparse tandem hopfield model for memory-enhanced time series prediction. In The Twelfth International Conference on Learning Representations, 2024. URL https://arxiv.org/abs/2312.17346.
  41. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564, 2023.
  42. Dnabert-2: Efficient foundation model and benchmark for multi-species genome. arXiv preprint arXiv:2306.15006, 2023.
Citations (23)

Summary

We haven't generated a summary for this paper yet.