STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series Prediction (2312.17346v1)
Abstract: We present STanHop-Net (Sparse Tandem Hopfield Network) for multivariate time series prediction with memory-enhanced capabilities. At the heart of our approach is STanHop, a novel Hopfield-based neural network block, which sparsely learns and stores both temporal and cross-series representations in a data-dependent fashion. In essence, STanHop sequentially learn temporal representation and cross-series representation using two tandem sparse Hopfield layers. In addition, StanHop incorporates two additional external memory modules: a Plug-and-Play module and a Tune-and-Play module for train-less and task-aware memory-enhancements, respectively. They allow StanHop-Net to swiftly respond to certain sudden events. Methodologically, we construct the StanHop-Net by stacking STanHop blocks in a hierarchical fashion, enabling multi-resolution feature extraction with resolution-specific sparsity. Theoretically, we introduce a sparse extension of the modern Hopfield model (Generalized Sparse Modern Hopfield Model) and show that it endows a tighter memory retrieval error compared to the dense counterpart without sacrificing memory capacity. Empirically, we validate the efficacy of our framework on both synthetic and real-world settings.
- Nonlinear programming. athena scientific belmont. Massachusets, USA, 1999.
- Lukas Biewald et al. Experiment tracking with weights and biases. Software available from wandb. com, 2:233, 2020.
- Learning with fenchel-young losses. The Journal of Machine Learning Research, 21(1):1314–1382, 2020.
- Failing to forecast rare events. Journal of Financial Economics, 142(3):1001–1016, 2021.
- Random point sets on the sphere—hole radii, covering, and separation. Experimental Mathematics, 27(1):62–81, 2018.
- Towards a new early warning system of financial crises. journal of International Money and Finance, 25(6):953–973, 2006.
- Adaptively sparse transformers. arXiv preprint arXiv:1909.00015, 2019.
- John M Danskin. The theory of max-min and its application to weapons allocation problems, volume 5. Springer Science & Business Media, 2012.
- On a model of associative memory with huge storage capacity. Journal of Statistical Physics, 168:288–299, 2017.
- Deep learning for time series classification: a review. Data mining and knowledge discovery, 33(4):917–963, 2019.
- Cloob: Modern hopfield networks with infoloob outperform clip. Advances in neural information processing systems, 35:20450–20468, 2022.
- Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.
- Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626):471–476, 2016.
- Energy transformer. arXiv preprint arXiv:2302.07253, 2023.
- John J Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8):2554–2558, 1982.
- John J Hopfield. Neurons with graded response have collective computational properties like those of two-state neurons. Proceedings of the national academy of sciences, 81(10):3088–3092, 1984.
- On sparse modern hopfield model, 2023. URL https://arxiv.org/abs/2309.12673.
- Attention-based deep multiple instance learning. In International conference on machine learning, pages 2127–2136. PMLR, 2018.
- Learning to remember rare events. arXiv preprint arXiv:1703.03129, 2017.
- Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451, 2020.
- Building transformers from neurons and astrocytes. Proceedings of the National Academy of Sciences, 120(34):e2219150120, 2023. URL https://www.biorxiv.org/content/10.1101/2022.10.12.511910v1.
- Dmitry Krotov. A new frontier for hopfield networks. Nature Reviews Physics, pages 1–2, 2023.
- Large associative memory problem in neurobiology and machine learning. arXiv preprint arXiv:2008.06996, 2020.
- Dense associative memory for pattern recognition. Advances in neural information processing systems, 29, 2016.
- Volatility spillover between economic sectors in financial crisis prediction: Evidence spanning the great financial crisis and covid-19 pandemic. Research in International Business and Finance, 57:101402, 2021.
- Climate-driven changes in the predictability of seasonal precipitation. Nature communications, 14(1):3822, 2023.
- Machine learning using a higher order correlation network. Technical report, Los Alamos National Lab.(LANL), Los Alamos, NM (United States); Univ. of Maryland, College Park, MD (United States), 1986.
- Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Advances in neural information processing systems, 32, 2019.
- Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In International conference on learning representations, 2021a.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021b.
- From softmax to sparsemax: A sparse model of attention and multi-label classification. In International conference on machine learning, pages 1614–1623. PMLR, 2016.
- Sparse modern hopfield networks. Associative Memory & Hopfield Networks in 2023. NeurIPS 2023 workshop., 2023. URL https://openreview.net/pdf?id=zwqlV7HoaT.
- Machine learning advances for time series forecasting. Journal of economic surveys, 37(1):76–111, 2023.
- Universal hopfield networks: A general framework for single-shot associative memory models. In International Conference on Machine Learning, pages 15561–15583. PMLR, 2022.
- Charles M Newman. Memory capacity in neural network models: Rigorous lower bounds. Neural Networks, 1(3):223–238, 1988.
- A time series is worth 64 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730, 2022.
- NIST handbook of mathematical functions hardback and CD-ROM. Cambridge university press, 2010.
- History compression via language models in reinforcement learning. In International Conference on Machine Learning, pages 17156–17185. PMLR, 2022.
- Long term memory storage capacity of multiconnected neural networks. Biological Cybernetics, 54(1):53–63, 1986.
- Sparse sequence-to-sequence models. arXiv preprint arXiv:1905.05702, 2019.
- Hopfield networks is all you need. arXiv preprint arXiv:2008.02217, 2020.
- Feature programming for multivariate time series prediction. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 29009–29029. PMLR, 23–29 Jul 2023. URL https://arxiv.org/abs/2306.06252.
- Meta-learning with memory-augmented neural networks. In International conference on machine learning, pages 1842–1850. PMLR, 2016.
- Context-enriched molecule representations improve few-shot drug discovery. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=XrMWUuEevr.
- Improving few-and zero-shot reaction template prediction using modern hopfield networks. Journal of chemical information and modeling, 62(9):2111–2120, 2022.
- Developing an early warning system to predict currency crises. European Journal of Operational Research, 237(3):1095–1104, 2014.
- Midlatitude error growth in atmospheric gcms: The role of eddy growth rate. Geophysical Research Letters, 48(23):e2021GL096126, 2021.
- On the convergence of the concave-convex procedure. In Nips, volume 9, pages 1759–1767, 2009.
- End-to-end memory networks. Advances in neural information processing systems, 28, 2015.
- Constantino Tsallis. Possible generalization of boltzmann-gibbs statistics. Journal of statistical physics, 52:479–487, 1988.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Electricity-consumption data reveals the economic impact and industry recovery during the pandemic. Scientific Reports, 11(1):19960, 2021a.
- Predicting city-scale daily electricity consumption using data-driven models. Advances in Applied Energy, 2:100025, 2021b.
- Memory networks. arXiv preprint arXiv:1410.3916, 2014.
- Modern hopfield networks and attention for immune repertoire classification. Advances in Neural Information Processing Systems, 33:18832–18845, 2020.
- Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in Neural Information Processing Systems, 34:22419–22430, 2021.
- The concave-convex procedure (cccp). Advances in neural information processing systems, 14, 2001.
- The concave-convex procedure. Neural computation, 15(4):915–936, 2003.
- Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023.
- Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In The Eleventh International Conference on Learning Representations, 2022.
- Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In The Eleventh International Conference on Learning Representations, 2023.
- Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11106–11115, 2021.
- Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International Conference on Machine Learning, pages 27268–27286. PMLR, 2022.