Hybrid State Space-based Learning for Sequential Data Prediction with Joint Optimization (2309.10553v1)
Abstract: We investigate nonlinear prediction/regression in an online setting and introduce a hybrid model that effectively mitigates, via a joint mechanism through a state space formulation, the need for domain-specific feature engineering issues of conventional nonlinear prediction models and achieves an efficient mix of nonlinear and linear components. In particular, we use recursive structures to extract features from raw sequential sequences and a traditional linear time series model to deal with the intricacies of the sequential data, e.g., seasonality, trends. The state-of-the-art ensemble or hybrid models typically train the base models in a disjoint manner, which is not only time consuming but also sub-optimal due to the separation of modeling or independent training. In contrast, as the first time in the literature, we jointly optimize an enhanced recurrent neural network (LSTM) for automatic feature extraction from raw data and an ARMA-family time series model (SARIMAX) for effectively addressing peculiarities associated with time series data. We achieve this by introducing novel state space representations for the base models, which are then combined to provide a full state space representation of the hybrid or the ensemble. Hence, we are able to jointly optimize both models in a single pass via particle filtering, for which we also provide the update equations. The introduced architecture is generic so that one can use other recurrent architectures, e.g., GRUs, traditional time series-specific models, e.g., ETS or other optimization methods, e.g., EKF, UKF. Due to such novel combination and joint optimization, we demonstrate significant improvements in widely publicized real life competition datasets. We also openly share our code for further research and replicability of our results.
- J. Li, S. Wei, and W. Dai, “Combination of manifold learning and deep learning algorithms for mid-term electrical load forecasting,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 5, pp. 2584–2593, 2023.
- A. Sayeed, Y. Choi, J. Jung, Y. Lops, E. Eslami, and A. K. Salman, “A deep convolutional neural network model for improving wrf simulations,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 2, pp. 750–760, 2023.
- E. Hwang, Y.-S. Park, J.-Y. Kim, S.-H. Park, J. Kim, and S.-H. Kim, “Intraoperative hypotension prediction based on features automatically generated within an interpretable deep learning model,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–15, 2023.
- S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, pp. 1735–80, 12 1997.
- K. Greff, R. Srivastava, J. Koutník, B. Steunebrink, and J. Schmidhuber, “Lstm: A search space odyssey,” IEEE transactions on neural networks and learning systems, vol. 28, 03 2015.
- M. E. Aydin and S. S. Kozat, “A hybrid framework for sequential data prediction with end-to-end optimization,” Digital Signal Processing, vol. 129, p. 103687, 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1051200422003049
- A. Singer, G. Wornell, and A. Oppenheim, “Nonlinear autoregressive modeling and estimation in the presence of noise,” Digital Signal Processing: A Review Journal, vol. 4, no. 4, pp. 207–221, Oct. 1994.
- T. Hill, M. O’Connor, and W. Remus, “Neural network models for time series forecasts,” Management Science, vol. 42, no. 7, pp. 1082–1092, 1996. [Online]. Available: https://doi.org/10.1287/mnsc.42.7.1082
- I. Kaastra and M. Boyd, “Designing a neural network for forecasting financial and economic time series,” Neurocomputing, vol. 10, no. 3, pp. 215–236, 1996, financial Applications, Part II. [Online]. Available: https://www.sciencedirect.com/science/article/pii/0925231295000399
- C. Fan, J. Wang, W. Gang, and S. Li, “Assessment of deep recurrent neural network-based strategies for short-term building energy predictions,” Applied Energy, vol. 236, pp. 700–710, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0306261918318221
- M. N. Fekri, H. Patel, K. Grolinger, and V. Sharma, “Deep learning for load forecasting with smart meter data: Online adaptive recurrent neural network,” Applied Energy, vol. 282, p. 116177, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0306261920315804
- A. Sagheer and M. Kotb, “Time series forecasting of petroleum production using deep lstm recurrent networks,” Neurocomputing, vol. 323, pp. 203–213, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231218311639
- A. Kisvari, Z. Lin, and X. Liu, “Wind power forecasting – a data-driven method along with gated recurrent neural network,” Renewable Energy, vol. 163, pp. 1895–1909, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0960148120316918
- M. Du, “Improving LSTM neural networks for better short-term wind power predictions,” CoRR, vol. abs/1907.00489, 2019.
- R. M. Cruz, R. Sabourin, G. D. Cavalcanti, and T. Ing Ren, “Meta-des: A dynamic ensemble selection framework using meta-learning,” Pattern Recognition, vol. 48, no. 5, pp. 1925–1935, 2015.
- H. Nazaripouya, B. Wang, Y. Wang, P. Chu, H. R. Pota, and R. Gadh, “Univariate time series prediction of solar power using a hybrid wavelet-arma-narx prediction method,” in 2016 IEEE/PES Transmission and Distribution Conference and Exposition (T&D), 2016, pp. 1–5.
- Y. Huang, Y. He, R. Lu, X. Li, and X. Yang, “Thermal infrared object tracking via unsupervised deep correlation filters,” Digital Signal Processing, vol. 123, p. 103432, 2022.
- ——, “Thermal infrared object tracking via unsupervised deep correlation filters,” Digital Signal Processing, vol. 123, p. 103432, 2022.
- L. Deng and J. Chen, “Sequence classification using the high-level features extracted from deep neural networks,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 6844–6848.
- Y. Jeong and S. Lee, “Recurrent neural network-adapted nonlinear arma-garch model with application to s&p 500 index data,” Journal of the Korean Data And Information Science Society, vol. 30, pp. 1187–1195, 09 2019.
- K. Cho, B. van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder–decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics, Oct. 2014, pp. 1724–1734.
- S. Bai, J. Z. Kolter, and V. Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” CoRR, vol. abs/1803.01271, 2018.
- E. S. Gardner Jr., “Exponential smoothing: The state of the art,” Journal of Forecasting, vol. 4, no. 1, pp. 1–28, 1985. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/for.3980040103
- P. Djuric, J. Kotecha, J. Zhang, Y. Huang, T. Ghirmai, M. Bugallo, and J. Miguez, “Particle filtering,” IEEE Signal Processing Magazine, vol. 20, no. 5, pp. 19–38, 2003.
- C. Yang, H. Wang, J. Tang, C. Shi, M. Sun, G. Cui, and Z. Liu, “Full-scale information diffusion prediction with reinforced recurrent networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 5, pp. 2271–2283, 2023.
- S. Xiao, J. Yan, M. Farajtabar, L. Song, X. Yang, and H. Zha, “Learning time series associated event sequences with recurrent point process networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 10, pp. 3124–3136, 2019.
- S. Makridakis, E. Spiliotis, and V. Assimakopoulos, “The m4 competition: 100,000 time series and 61 forecasting methods,” International Journal of Forecasting, vol. 36, no. 1, pp. 54–74, 2020, m4 Competition.
- ——, “The m5 competition: Background, organization, and implementation,” International Journal of Forecasting, 2021.
- F. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: Continual prediction with lstm,” Neural computation, vol. 12, pp. 2451–71, 10 2000.
- E. Box George, M. Jenkins Gwilym, C. Reinsel Gregory, and M. Ljung Greta, “Time series analysis: forecasting and control,” San Francisco: Holden Bay, 1976.
- N. S. Arunraj, D. Ahrens, and M. Fernandes, “Application of sarimax model to forecast daily sales in food retail industry,” International Journal of Operations Research and Information Systems (IJORIS), vol. 7, no. 2, pp. 1–21, 2016.
- Y. Grenier, “Time-dependent arma modeling of nonstationary signals,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 31, no. 4, pp. 899–911, 1983.
- E. J. Hannan and J. Rissanen, “Recursive estimation of mixed autoregressive-moving average order,” Biometrika, vol. 69, no. 1, pp. 81–94, 1982. [Online]. Available: http://www.jstor.org/stable/2335856
- M. M. Olama, S. M. Djouadi, I. G. Papageorgiou, and C. D. Charalambous, “Position and velocity tracking in mobile networks using particle and kalman filtering with comparison,” IEEE Transactions on Vehicular Technology, vol. 57, no. 2, pp. 1001–1010, 2008.
- T. Ergen and S. S. Kozat, “Online training of lstm networks in distributed systems for variable length data sequences,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 10, pp. 5159–5165, 2018.
- J. D. Hamilton, “State-space models,” Handbook of econometrics, vol. 4, pp. 3039–3080, 1994.
- R. E. Kalman, “A new approach to linear filtering and prediction problems,” Journal of Basic Engineering, vol. 82, no. 1, p. 35, 1960. [Online]. Available: http://dx.doi.org/10.1115/1.3662552
- A. Doucet, S. Godsill, and C. Andrieu, “On sequential monte carlo sampling methods for bayesian filtering,” Statistics and Computing, vol. 10, pp. 197 – 208, 2000.
- J. Liu, “Metropolized independent sampling with comparisons to rejection sampling and importance sampling,” Statistics and Computing, vol. 6, 08 2000.
- S. J. Julier and J. K. Uhlmann, “New extension of the kalman filter to nonlinear systems,” in Signal processing, sensor fusion, and target recognition VI, vol. 3068. Spie, 1997, pp. 182–193.
- E. A. Wan and R. Van Der Merwe, “The unscented kalman filter for nonlinear estimation,” in Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No. 00EX373). Ieee, 2000, pp. 153–158.
- N. Chopin, “A sequential particle filter method for static models,” Biometrika, vol. 89, no. 3, pp. 539–552, 2002.
- L. Torgo, “Regression data sets.” [Online]. Available: https://www.dcc.fc.up.pt/ ltorgo/Regression/DataSets.html
- J. Alcala-Fdez, A. Fernández, J. Luengo, J. Derrac, S. Garc’ia, L. Sanchez, and F. Herrera, “Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework,” Journal of Multiple-Valued Logic and Soft Computing, vol. 17, pp. 255–287, 01 2010.
- C. E. Rasmussen, R. M. Neal, G. Hinton, D. Camp, M. Revow, Z. Ghahramani, R. Kustra, and R. Tibshirani, “Delve data sets.”
- Alcoa Inc., “Common stock.” [Online]. Available: https://finance.yahoo.com/quote/AA
- R. J. Hyndman and Y. Khandakar, “Automatic time series forecasting: The forecast package for r,” Journal of Statistical Software, vol. 27, no. 3, p. 1–22, 2008.
- V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th International Conference on International Conference on Machine Learning, ser. ICML’10. Madison, WI, USA: Omnipress, 2010, p. 807–814.
- G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017.
- R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent neural networks,” in Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, ser. ICML’13. JMLR.org, 2013, p. III–1310–III–1318.