The Bayesian Context Trees State Space Model for time series modelling and forecasting (2308.00913v2)
Abstract: A hierarchical Bayesian framework is introduced for developing rich mixture models for real-valued time series, partly motivated by important applications in financial time series analysis. At the top level, meaningful discrete states are identified as appropriately quantised values of some of the most recent samples. These observable states are described as a discrete context-tree model. At the bottom level, a different, arbitrary model for real-valued time series -- a base model -- is associated with each state. This defines a very general framework that can be used in conjunction with any existing model class to build flexible and interpretable mixture models. We call this the Bayesian Context Trees State Space Model, or the BCT-X framework. Efficient algorithms are introduced that allow for effective, exact Bayesian inference and learning in this setting; in particular, the maximum a posteriori probability (MAP) context-tree model can be identified. These algorithms can be updated sequentially, facilitating efficient online forecasting. The utility of the general framework is illustrated in two particular instances: When autoregressive (AR) models are used as base models, resulting in a nonlinear AR mixture model, and when conditional heteroscedastic (ARCH) models are used, resulting in a mixture model that offers a powerful and systematic way of modelling the well-known volatility asymmetries in financial data. In forecasting, the BCT-X methods are found to outperform state-of-the-art techniques on simulated and real-world data, both in terms of accuracy and computational requirements. In modelling, the BCT-X structure finds natural structure present in the data. In particular, the BCT-ARCH model reveals a novel, important feature of stock market index data, in the form of an enhanced leverage effect.
- An empirical comparison of machine learning models for time series forecasting. Econometric Reviews, 29(5-6):594–621, 2010.
- GluonTS: Probabilistic and neural time series modeling in Python. J. of Mach. Learn. Res., 21(116):1–6, 2020.
- Energy time series forecasting based on pattern sequence similarity. IEEE Trans. Knowl. Data Eng., 23(8):1230–1243, August 2010.
- A short-term, pattern-based model for water-demand forecasting. J. of Hydroinformatics, 9(1):39–50, January 2007.
- Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. Series B, 72(3):269–342, June 2010.
- Markov-switching GARCH models in R: The MSGARCH package. J. Stat. Softw., 91(4):1–38, 2019.
- F. Audrino and P. Bühlmann. Tree-structured generalized autoregressive conditional heteroscedastic models. J. R. Stat. Soc. Series B, 63(4):727–744, 2001.
- Deep learning for time series forecasting: Tutorial and literature survey. ACM Computing Surveys, 55(6):1–36, 2022.
- D.J. Berndt and J. Clifford. Using dynamic time warping to find patterns in time series. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pages 359–370, Seattle, WA, July 1994.
- T. Bollerslev. Generalized autoregressive conditional heteroskedasticity. J. Econometrics, 31(3):307–327, 1986.
- G.N. Boshnakov and D. Ravagli. mixAR: Mixture Autoregressive Models. R package version 0.22.5, January 2021. Available at CRAN.R-project.org/package=mixAR.
- Time series analysis: Forecasting and control. John Wiley & Sons, Hoboken, NJ, 2015.
- Classification and regression trees. CRC press, Boca Raton, FL, 1984.
- P. Bühlmann and A.J. Wyner. Variable length Markov chains. Ann. Statist., 27(2):480–513, April 1999.
- Propagation of uncertainty in Bayesian kernel models-application to multiple-step ahead forecasting. In 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), volume 2, pages II–701, Hong Kong, China, April 2003.
- Inference in hidden Markov models. Springer, New York, NY, 2006.
- K.S. Chan. Consistency and limiting distribution of the least squares estimator of a threshold autoregressive model. Ann. Statist., 21(1):520–533, 1993.
- K.S. Chan and B. Ripley. Tsa: Time series analysis. R package version 1.3, September 2020. Available at CRAN.R-project.org/package=TSA.
- S. Chib. Marginal likelihood from the Gibbs output. J. Amer. Statist. Assoc., 90(432):1313–1321, 1995.
- S. Chib and I. Jeliazkov. Marginal likelihood from the Metropolis-Hastings output. J. Amer. Statist. Assoc., 96(453):270–281, 2001.
- Bayesian CART model search. J. Amer. Statist. Assoc., 93(443):935–948, 1998.
- A recurrent latent variable model for sequential data. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and Garnett. R., editors, Advances in Neural Information Processing Systems, Advances in Approximate Bayesian Inference & Black Box Inference (AABI) Workshop, volume 28, Montréal, Quebec, December 2015.
- Time series analysis: With applications in R. Springer, New York, NY, 2008.
- P. Dellaportas and I.D. Vrontos. Modelling volatility asymmetries: A Bayesian analysis of a class of tree structured multivariate GARCH models. J. Econometrics, 10(3):503–520, 2007.
- Sequential Monte Carlo methods in practice. Springer, New York, NY, 2001.
- J. Durbin and S.J. Koopman. Time series analysis by state space methods. Oxford University Press, Oxford, U.K., 2012.
- Identification of Gaussian process state space models. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30, Long Beach, CA, December 2017.
- R.F. Engle. Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica, 50(4):987–1007, July 1982.
- Forecasting big time series: Old and new. Proc. VLDB Endow., 11(12):2102–2105, August 2018.
- N. Friel and A.N. Pettitt. Marginal likelihood estimation via power posteriors. J. R. Stat. Soc. Series B, 70(3):589–607, 2008.
- R. Frigola. Bayesian time series learning with Gaussian processes. PhD thesis, Department of Engineering, University of Cambridge, Cambridge, U.K., 2015.
- Variational Gaussian process state-space models. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27, Montréal, Quebec, December 2014.
- Bayesian inference and learning in Gaussian process state-space models with particle MCMC. In Burges C.J., L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26, Lake Tahoe, CA, December 2013.
- Stock time series pattern matching: Template-based vs. rule-based approaches. Eng. Appl. Artif. Intell., 20(3):347–364, April 2007.
- A. Ghalanos. rugarch: Univariate GARCH models. R package version 1.4, October 2022. Available at CRAN.R-project.org/package=rugarch.
- Gaussian process priors with uncertain inputs application to multiple-step ahead time series forecasting. In S. Becker, S. S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems, volume 15, Vancouver, BC, December 2002.
- On the relation between the expected value and the volatility of the nominal excess return on stocks. J. Finance, 48(5):1779–1801, 1993.
- A. Graves. Generating sequences with recurrent neural networks. arXiv e-prints, 1308.0850 [cs.NE], August 2013.
- S.F. Gray. Modeling the conditional distribution of interest rates as a regime-switching process. J. Financ. Econ., 42(1):27–62, 1996.
- A new approach to Markov-switching GARCH models. J. Financ. Econometrics, 2(4):493–530, 2004.
- J.D. Hamilton. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica, 57(2):357–384, March 1989.
- B.E. Hansen. Threshold autoregression in economics. Stat. Interface, 4(2):123–127, 2011.
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735–1780, 1997.
- D. Hosszejni and G. Kastner. Modeling univariate and multivariate stochastic volatility in R with stochvol and factorstochvol. J. Stat. Softw., 100(12):1–34, 2021.
- Pattern-based wind speed prediction based on generalized principal component analysis. IEEE Trans. Sustain. Energy, 5(3):866–874, July 2014.
- R.J. Hyndman. fma: Data sets from “Forecasting: methods and applications” by Makridakis, Wheelwright & Hyndman (1998). R package version 2.4, January 2020. Available at CRAN.R-project.org/package=fma.
- R.J. Hyndman and Y. Khandakar. Automatic time series forecasting: The forecast package for R. J. Stat. Softw., 26(3):1–22, 2008.
- Forecasting with exponential smoothing: The state space approach. Springer-Verlag, Berlin, 2008.
- R.E. Kalman. A new approach to linear filtering and prediction problems. J. Basic Eng., 82(1):35–45, 1960.
- Deep variational Bayes filters: Unsupervised learning of state space models from raw data. In 5th International Conference on Learning Representations, ICLR ’17, Toulon, France, 2017.
- Bayes factors. J. Amer. Statist. Assoc., 90(430):773–795, 1995.
- D.P. Kingma and M. Welling. Auto-encoding variational Bayes. arXiv e-prints, 1312.6114 [stat.ML], December 2013.
- Bayesian Context Trees: Modelling and exact inference for discrete time series. J. R. Stat. Soc. Series B, 84(4):1287–1323, September 2022.
- G. Koop and S.M. Potter. Dynamic asymmetries in US unemployment. J. Bus. Econ. Stat., 17(3):298–312, 1999.
- Deep Kalman filters. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and Garnett. R., editors, Advances in Neural Information Processing Systems, Advances in Approximate Bayesian Inference & Black Box Inference (AABI) Workshop, volume 28, Montréal, Quebec, December 2015.
- Structured inference networks for nonlinear state space models. In AAAI Conference on Artificial Intelligence, San Fransisco, CA, February 2017.
- A.R. Linero. A review of tree-based Bayesian methods. Commun. Stat. Appl. Methods, 24(6):543–559, 2017.
- A novel statistical time-series pattern based interval forecasting strategy for activity durations in workflow systems. J. Syst. Softw., 84(3):354–376, March 2011.
- W.Y. Loh. Fifty years of classification and regression trees. Int. Stat. Rev., 82(3):329–348, 2014.
- Bayesian change-point detection via context-tree weighting. In 2022 IEEE Workshop on Information Theory (ITW), pages 125–130, Mumbai, India, November 2022.
- Change-point detection and segmentation of discrete data using Bayesian Context Trees. arXiv preprint arXiv:2203.04341, 2022.
- M. Mächler and P. Bühlmann. Variable length Markov chains: Methodology, computing, and software. J. Comput. Grap. Stat., 13(2):435–455, 2004.
- D.J.C. MacKay. Bayesian interpolation. Neural Comput., 4(3):415–447, May 1992.
- The M4 Competition: Results, findings, conclusion and way forward. Int. J. Forecast., 34(4):802–808, October 2018.
- Statistical and machine learning forecasting methods: Concerns and ways forward. PLOS ONE, 13(3):1–26, 2018.
- T. Matsushima and S. Hirasawa. A Bayes coding algorithm using context tree. In 1994 IEEE International Symposium on Information Theory (ISIT), page 386, Trondheim, Norway, June 1994.
- T. Matsushima and S. Hirasawa. Reducing the space complexity of a Bayes coding algorithm using an expanded context tree. In 2009 IEEE International Symposium on Information Theory (ISIT), pages 719–723, Seoul, Korea, June 2009.
- C. Meek and D. Chickering, D.M.and Heckerman. Autoregressive tree models for time-series analysis. In 2002 SIAM International Conference on Data Mining, pages 229–244, Arlington, VA, April 2002.
- Forecasting the US unemployment rate. J. Amer. Statist. Assoc., 93(442):478–493, 1998.
- R. Murray-Smith and A. Girard. Gaussian Process priors with ARMA noise models. In Irish Signals and Systems Conference, volume 147-12, page 152, Maynooth, Ireland, June 2001.
- D.B. Nelson. Conditional heteroskedasticity in asset returns: A new approach. Econometrica, 59(2):347–370, 91 1991.
- N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. arXiv e-prints, 1905.10437 [cs.LG], May 2019.
- Ordinal pattern based similarity analysis for EEG recordings. Clinical Neurophysiology, 121(5):694–703, May 2010.
- I. Papageorgiou and I. Kontoyiannis. The posterior distribution of Bayesian Context-Tree models: Theory and applications. In 2022 IEEE International Symposium on Information Theory (ISIT), pages 702–707, Espoo, Finland, June 2022.
- I. Papageorgiou and I. Kontoyiannis. Posterior representations for Bayesian Context Trees: Sampling, estimation and convergence. To appear, Bayesian Analysis, 2023.
- I. Papageorgiou and I. Kontoyiannis. Truly Bayesian entropy estimation. In 2023 IEEE Information Theory Workshop (ITW), pages 497–502, Saint-Malo, France, April 2023.
- Revisiting context-tree weighting for Bayesian inference. In 2021 IEEE International Symposium on Information Theory (ISIT), pages 2906–2911, Melbourne, Australia, July 2021.
- S.M. Potter. A nonlinear approach to US GNP. J. Appl. Econometrics, 10(2):109–125, 1995.
- Deep state space models for time series forecasting. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31, Montréal, Quebec, December 2018.
- C.E. Rasmussen and Z. Ghahramani. Occam’s razor. In T. Leen, T. Dietterich, and T. Tresp, editors, Advances in Neural Information Processing Systems, volume 13, Denver, CO, November 2000.
- Gaussian processes for machine learning. MIT Press, Cambrdige, MA, 2006.
- Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31st International Conference on Machine Learning, ICML ’14, pages 1278–1286, 2014.
- J. Rissanen. A universal data compression system. IEEE Trans. Inform. Theory, 29(5):656–664, September 1983.
- J. Rissanen. A universal prior for integers and estimation by minimum description length. Ann. Statist., 11(2):416–431, June 1983.
- J. Rissanen. Complexity of strings in the class of Markov sources. IEEE Trans. Inform. Theory, 32(4):526–532, July 1986.
- Gaussian processes for time-series modelling. Phil. Trans. R. Soc. A., 371(1984):20110550, February 2013.
- P. Rothman. Forecasting asymmetric unemployment rates. Review of Economics and Statistics, 80(1):164–168, 1998.
- Pattern-based analysis of time series: Estimation. In 2020 IEEE International Symposium on Information Theory (ISIT), pages 1236–1241, Los Angeles, CA, June 2020.
- DeepAR: Probabilistic forecasting with autoregressive recurrent networks. Int. J. Forecast., 36(3):1181–1191, 2020.
- J.A. Sanchez-Espigares and A. Lopez-Moreno. MSwM: Fitting Markov switching models. R package version 1.5, June 2021. Available at CRAN.R-project.org/package=MSwM.
- N. Shephard and T.G. Andersen. Stochastic volatility: Origins and overview. In T. Mikosch, J.-P. Kreiß, R.A. Davis, and T.G. Andersen, editors, Handbook of Financial Time Series, pages 233–254. Springer, Berlin, Heidelberg, 2009.
- H. Tong. Non-linear time series: A dynamical system approach. Oxford University Press, Oxford, U.K., 1990.
- H. Tong. Threshold models in time series analysis – 30 years on. Stat. Interface, 4(2):107–118, 2011.
- H. Tong and K.S. Lim. Threshold autoregression, limit cycles and cyclical data. J. R. Stat. Soc. Series B, 42(3):245–268, 1980.
- R.S. Tsay. Analysis of financial time series. John Wiley & Sons, Hoboken, NJ, 2005.
- State-space inference and learning with Gaussian processes. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9 of Proceedings of Machine Learning Research, pages 868–875, Sardinia, Italy, May 2010.
- R.D. Turner. Gaussian processes for state space models and change point detection. PhD thesis, Department of Engineering, University of Cambridge, Cambridge, U.K., 2012.
- Bayesian inference methods for univariate and multivariate GARCH models: A survey. J. Econ. Surv., 29(1):76–96, 2015.
- Full Bayesian inference for GARCH and EGARCH models. J. Bus. Econ. Stat., 18(2):187–198, 2000.
- A full-factor multivariate GARCH model. J. Econometrics, 6(2):312–334, 2003.
- Inference for some multivariate ARCH and GARCH models. J. Forecast., 22(6-7):427–446, 2003.
- Optimal sequential probability assignment for individual sequences. IEEE Trans. Inform. Theory, 40(2):384–396, March 1994.
- F.M.J. Willems. The context-tree weighting method: Extensions. IEEE Trans. Inform. Theory, 44(2):792–798, March 1998.
- The context tree weighting method: Basic properties. IEEE Trans. Inform. Theory, 41(3):653–664, May 1995.
- On a mixture autoregressive model. J. R. Stat. Soc. Series B, 62(1):95–115, January 2000.
- The sequence memoizer. Communications of the ACM, 54(2):91–98, February 2011.
- Temporal regularized matrix factorization for high-dimensional time series prediction. In D.D. Lee, M. Sugiyama, U. von Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 29, Barcelona, Spain, December 2016.
- J.M. Zakoian. Threshold heteroskedastic models. J. Econ. Dyn. control, 18(5):931–955, 1994.
- Forecasting with artificial neural networks: The state of the art. Int. J. Forecast., 14(1):35–62, March 1998.
- State space LSTM models with particle MCMC inference. arXiv e-prints, 1711.11179 [cs.LG], November 2017.