Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Sparse Modern Hopfield Model (2309.12673v2)

Published 22 Sep 2023 in cs.LG, cs.AI, cs.CV, and stat.ML

Abstract: We introduce the sparse modern Hopfield model as a sparse extension of the modern Hopfield model. Like its dense counterpart, the sparse modern Hopfield model equips a memory-retrieval dynamics whose one-step approximation corresponds to the sparse attention mechanism. Theoretically, our key contribution is a principled derivation of a closed-form sparse Hopfield energy using the convex conjugate of the sparse entropic regularizer. Building upon this, we derive the sparse memory retrieval dynamics from the sparse energy function and show its one-step approximation is equivalent to the sparse-structured attention. Importantly, we provide a sparsity-dependent memory retrieval error bound which is provably tighter than its dense analog. The conditions for the benefits of sparsity to arise are therefore identified and discussed. In addition, we show that the sparse modern Hopfield model maintains the robust theoretical properties of its dense counterpart, including rapid fixed point convergence and exponential memory capacity. Empirically, we use both synthetic and real-world datasets to demonstrate that the sparse Hopfield model outperforms its dense counterpart in many situations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. Anonymous. STanhop: Sparse tandem hopfield model for memory-enhanced time series prediction. In Submitted to The Twelfth International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=6iwg437CZs. under review.
  2. Deep equilibrium models. Advances in Neural Information Processing Systems, 32, 2019. URL https://arxiv.org/abs/1909.01377.
  3. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020. URL https://arxiv.org/abs/2004.05150.
  4. Findings of the 2017 conference on machine translation (wmt17). Association for Computational Linguistics, 2017.
  5. Johannes Brandstetter. Blog post: Hopfield networks is all you need, 2021. URL https://ml-jku.github.io/hopfield-layers/. Accessed: April 4, 2023.
  6. Random point sets on the sphere—hole radii, covering, and separation. Experimental Mathematics, 27(1):62–81, 2018. URL https://arxiv.org/abs/1512.07470.
  7. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020. URL https://arxiv.org/abs/1512.07470.
  8. Phase transition in limiting distributions of coherence of high-dimensional random matrices. Journal of Multivariate Analysis, 107:24–39, 2012. URL https://arxiv.org/abs/1102.2926.
  9. Multiple instance learning: A survey of problem characteristics and applications. Pattern Recognition, 77:329–353, 2018. URL https://arxiv.org/abs/1612.03365.
  10. Autoformer: Searching transformers for visual recognition. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12270–12280, 2021. URL https://arxiv.org/abs/2107.00651.
  11. Miles: Multiple-instance learning via embedded instance selection. IEEE transactions on pattern analysis and machine intelligence, 28(12):1931–1947, 2006.
  12. Dissimilarity-based ensembles for multiple instance learning. IEEE transactions on neural networks and learning systems, 27(6):1379–1391, 2015. URL https://arxiv.org/abs/1402.1349.
  13. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509, 2019. URL https://arxiv.org/abs/1904.10509.
  14. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022. URL https://arxiv.org/abs/2204.02311.
  15. On the lambert w function. Advances in Computational mathematics, 5:329–359, 1996.
  16. Adaptively sparse transformers. arXiv preprint arXiv:1909.00015, 2019. URL https://arxiv.org/abs/1909.00015.
  17. Order statistics. John Wiley & Sons, 2004.
  18. On a model of associative memory with huge storage capacity. Journal of Statistical Physics, 168:288–299, 2017. URL https://arxiv.org/abs/1702.01929.
  19. Solving the multiple instance problem with axis-parallel rectangles. Artificial intelligence, 89(1-2):31–71, 1997.
  20. Michael Elad. Sparse and redundant representations: from theory to applications in signal and image processing, volume 2. Springer, 2010.
  21. Peter Földiak. Forming sparse representations by local anti-hebbian learning. Biological cybernetics, 64(2):165–170, 1990.
  22. Cloob: Modern hopfield networks with infoloob outperform clip. Advances in neural information processing systems, 35:20450–20468, 2022. URL https://arxiv.org/abs/2110.11316.
  23. On the properties of the softmax function with application in game theory and reinforcement learning. arXiv preprint arXiv:1704.00805, 2017. URL https://arxiv.org/abs/1704.00805.
  24. Convergence theorems for generalized alternating minimization procedures. Journal of machine learning research, 6(12), 2005. URL https://www.jmlr.org/papers/volume6/gunawardana05a/gunawardana05a.pdf.
  25. Energy transformer. arXiv preprint arXiv:2302.07253, 2023. URL https://arxiv.org/abs/2302.07253.
  26. John J Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8):2554–2558, 1982.
  27. John J Hopfield. Neurons with graded response have collective computational properties like those of two-state neurons. Proceedings of the national academy of sciences, 81(10):3088–3092, 1984.
  28. Attention-based deep multiple instance learning. In International conference on machine learning, pages 2127–2136. PMLR, 2018. URL https://arxiv.org/abs/1802.04712.
  29. Dnabert: pre-trained bidirectional encoder representations from transformers model for dna-language in genome. Bioinformatics, 37(15):2112–2120, 2021. URL https://www.biorxiv.org/content/10.1101/2020.09.17.301879v1.
  30. Empowering multiple instance histopathology cancer diagnosis by cell graphs. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2014: 17th International Conference, Boston, MA, USA, September 14-18, 2014, Proceedings, Part II 17, pages 228–235. Springer, 2014.
  31. Building transformers from neurons and astrocytes. Proceedings of the National Academy of Sciences, 120(34):e2219150120, 2023. URL https://www.biorxiv.org/content/10.1101/2022.10.12.511910v1.
  32. Dense associative memory for pattern recognition. Advances in neural information processing systems, 29, 2016. URL https://arxiv.org/abs/1606.01164.
  33. Large associative memory problem in neurobiology and machine learning. In International Conference on Learning Representations, 2021. URL https://arxiv.org/abs/2008.06996.
  34. Bag encoding strategies in multiple instance learning problems. Information Sciences, 467:559–578, 2018.
  35. Machine learning using a higher order correlation network. Technical report, Los Alamos National Lab.(LANL), Los Alamos, NM (United States); Univ. of Maryland, College Park, MD (United States), 1986.
  36. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021. URL https://arxiv.org/abs/2103.14030.
  37. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017. URL https://arxiv.org/abs/1711.05101.
  38. Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11(1), 2010. URL https://arxiv.org/abs/0908.0050.
  39. Winner-take-all autoencoders. Advances in neural information processing systems, 28, 2015. URL https://arxiv.org/abs/1409.2752.
  40. A framework for multiple-instance learning. Advances in neural information processing systems, 10, 1997.
  41. From softmax to sparsemax: A sparse model of attention and multi-label classification. In International conference on machine learning, pages 1614–1623. PMLR, 2016. URL https://arxiv.org/abs/1602.02068.
  42. Sparse modern hopfield networks. Associative Memory & Hopfield Networks in 2023. NeurIPS 2023 workshop., 2023. URL https://openreview.net/pdf?id=zwqlV7HoaT.
  43. Universal hopfield networks: A general framework for single-shot associative memory models. In International Conference on Machine Learning, pages 15561–15583. PMLR, 2022. URL https://arxiv.org/abs/2202.04557.
  44. Charles M Newman. Memory capacity in neural network models: Rigorous lower bounds. Neural Networks, 1(3):223–238, 1988.
  45. Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision research, 37(23):3311–3325, 1997.
  46. NIST handbook of mathematical functions hardback and CD-ROM. Cambridge university press, 2010.
  47. History compression via language models in reinforcement learning. In International Conference on Machine Learning, pages 17156–17185. PMLR, 2022. URL https://arxiv.org/abs/2205.12258.
  48. Günther Palm. Neural associative memories and sparse coding. Neural Networks, 37:165–171, 2013.
  49. Long term memory storage capacity of multiconnected neural networks. Biological Cybernetics, 54(1):53–63, 1986.
  50. Sparse sequence-to-sequence models. arXiv preprint arXiv:1905.05702, 2019. URL https://arxiv.org/abs/1905.05702.
  51. Blockwise self-attention for long document understanding. arXiv preprint arXiv:1911.02972, 2019. URL https://arxiv.org/abs/1911.02972.
  52. Hopfield networks is all you need. In International Conference on Learning Representations, 2021. URL https://arxiv.org/abs/2008.02217.
  53. Dictionaries for sparse representation modeling. Proceedings of the IEEE, 98(6):1045–1057, 2010.
  54. Improving few-and zero-shot reaction template prediction using modern hopfield networks. Journal of chemical information and modeling, 62(9):2111–2120, 2022.
  55. On the convergence of the concave-convex procedure. In Advances in neural information processing systems, volume 9, pages 1759–1767, 2009. URL https://papers.nips.cc/paper_files/paper/2009/file/8b5040a8a5baf3e0e67386c2e3a9b903-Paper.pdf.
  56. Sparse sinkhorn attention. In International Conference on Machine Learning, pages 9438–9447. PMLR, 2020.
  57. Efficient transformers: A survey. ACM Computing Surveys, 55(6):1–28, 2022. URL https://arxiv.org/abs/2009.06732.
  58. Attention is all you need. Advances in neural information processing systems, 30, 2017. URL https://arxiv.org/abs/1706.03762.
  59. Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2):1–305, 2008.
  60. Solving multiple-instance problem: A lazy learning approach. 2000.
  61. Modern hopfield networks and attention for immune repertoire classification. Advances in Neural Information Processing Systems, 33:18832–18845, 2020. URL https://arxiv.org/abs/2007.13505.
  62. Transformers from an optimization perspective. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=VT0Y4PlV2m0.
  63. The concave-convex procedure (cccp). Advances in neural information processing systems, 14, 2001. URL https://proceedings.neurips.cc/paper_files/paper/2001/file/a012869311d64a44b5a0d567cd20de04-Paper.pdf.
  64. The concave-convex procedure. Neural computation, 15(4):915–936, 2003.
  65. Willard I Zangwill. Nonlinear programming: a unified approach, volume 52. Prentice-Hall Englewood Cliffs, NJ, 1969.
  66. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In The Eleventh International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=vSVLM2j9eie.
  67. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11106–11115, 2021. URL https://arxiv.org/abs/2012.07436.
  68. Film: Frequency improved legendre memory model for long-term time series forecasting. Advances in Neural Information Processing Systems, 35:12677–12690, 2022. URL https://arxiv.org/abs/2205.08897.
Citations (19)

Summary

We haven't generated a summary for this paper yet.