Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cross-validatory model selection for Bayesian autoregressions with exogenous regressors (2301.08276v3)

Published 19 Jan 2023 in stat.ME

Abstract: Bayesian cross-validation (CV) is a popular method for predictive model assessment that is simple to implement and broadly applicable. A wide range of CV schemes is available for time series applications, including generic leave-one-out (LOO) and K-fold methods, as well as specialized approaches intended to deal with serial dependence such as leave-future-out (LFO), h-block, and hv-block. Existing large-sample results show that both specialized and generic methods are applicable to models of serially-dependent data. However, large sample consistency results overlook the impact of sampling variability on accuracy in finite samples. Moreover, the accuracy of a CV scheme depends on many aspects of the procedure. We show that poor design choices can lead to elevated rates of adverse selection. In this paper, we consider the problem of identifying the regression component of an important class of models of data with serial dependence, autoregressions of order p with q exogenous regressors (ARX(p,q)), under the logarithmic scoring rule. We show that when serial dependence is present, scores computed using the joint (multivariate) density have lower variance and better model selection accuracy than the popular pointwise estimator. In addition, we present a detailed case study of the special case of ARX models with fixed autoregressive structure and variance. For this class, we derive the finite-sample distribution of the CV estimators and the model selection statistic. We conclude with recommendations for practitioners.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. “Entropy expressions and their estimators for multivariate distributions.” IEEE Transactions on Information Theory, 35(3): 688–692.
  2. “A survey of cross-validation procedures for model selection.” Statistics Surveys, 4: 40–79.
  3. “No unbiased estimator of the variance of K-fold cross-validation.” Journal of Machine Learning Research, 5: 1089–1105. URL https://www.semanticscholar.org/paper/17f4a82822309d4ba0e9b2840afc5dfaa499be97
  4. “The Effective Sample Size.” Econometric Reviews, 33(1-4): 197–217.
  5. “On the use of cross-validation for time series predictor evaluation.” Information Sciences, 191: 192–213. URL https://doi.org/10.1016{%}2Fj.ins.2011.12.028
  6. “A note on the validity of cross-validation for evaluating autoregressive time series prediction.” Computational Statistics and Data Analysis, 120: 70–83. URL https://doi.org/10.1016/j.csda.2017.11.003
  7. Billingsley, P. (2008). Probability and measure. John Wiley & Sons.
  8. “Approximate leave-future-out cross-validation for Bayesian time series models.” Journal of Statistical Computation and Simulation, 1–25.
  9. Burman, P. (1989). “A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods.” Biometrika, 76(3): 503–514. URL https://doi.org/10.1093/BIOMET/76.3.503
  10. “A cross-validatory method for dependent data.” Biometrika, 81(2): 351–358. URL https://doi.org/10.1093/BIOMET/81.2.351
  11. Evaluating time series forecasting models: an empirical study on performance estimation methods, volume 109. Springer US. URL https://doi.org/10.1007/s10994-020-05910-7
  12. Davies, R. B. (1973). “Numerical inversion of a characteristic function.” Biometrika, 60(2): 415–417.
  13. Dawid, A. P. (1984). “Statistical Theory: The Prequential Approach.” Journal of the Royal Statistical Society. Series A, 147(2): 278–292.
  14. Duchi, J. (2007). “Derivations for Linear Algebra and Optimization.” Berkeley, California, 1–13.
  15. Geisser, S. (1975). “The predictive sample reuse method with applications.” Journal of the American Statistical Association, 70(350): 320–328. URL https://doi.org/10.1080/01621459.1975.10479865https://doi.org/10.1080{%}2F01621459.1975.10479865
  16. Bayesian data analysis. Boca Raton, FL, USA: Chapman & Hall/CRC, 3 edition.
  17. “Strictly Proper Scoring Rules, Prediction, and Estimation.” Journal of the American Statistical Association, 102(477): 359–378. URL http://www.tandfonline.com/doi/abs/10.1198/016214506000001437
  18. Nonparametric Curve Estimation from Time Series. New York, NY: Springer.
  19. Imhof, P. (1961). “Computing the distribution of quadratic forms in normal variables.” Biometrika, 48(3): 419.
  20. Lee, D. (2011). “A comparison of conditional autoregressive models used in Bayesian disease mapping.” Spatial and Spatio-temporal Epidemiology, 2(2): 79–89. URL http://dx.doi.org/10.1016/j.sste.2011.03.001
  21. “Information inequalities for joint distributions, with interpretations and applications.” IEEE Transactions on Information Theory, 56(6): 2699–2713.
  22. “Quadratic forms.”
  23. “The Neural Testbed: Evaluating Joint Predictions.” In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems. URL https://openreview.net/forum?id=JyTT03dqCFD
  24. Racine, J. (2000). “Consistent cross-validatory model-selection for dependent data: hv-block cross-validation.” Journal of Econometrics, 99(1): 39–61. URL https://doi.org/10.1016{%}2Fs0304-4076{%}2800{%}2900030-0
  25. “Workflow Techniques for the Robust Use of Bayes Factors.” Psychological Methods.
  26. Sims, C. A. (1980). “Macroeconomics and Reality.” Econometrica, 48(1): 1–48.
  27. “Uncertainty in Bayesian Leave-One-Out Cross-Validation Based Model Comparison.” ArXiv preprint. URL http://arxiv.org/abs/2008.10296
  28. “Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC.” Statistics and Computing, 27(5): 1413–1432. URL https://doi.org/10.1007{%}2Fs11222-016-9709-3https://doi.org/10.1007{%}2Fs11222-016-9696-4
  29. “A survey of Bayesian predictive methods for model assessment, selection and comparison.” Statistics Surveys, 6(1): 142–228.
  30. Ward, E. J. (2008). “A review and comparison of four commonly used Bayesian and maximum likelihood model selection tools.” Ecological Modelling, 211(1-2): 1–10.
Citations (3)

Summary

We haven't generated a summary for this paper yet.