Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

High-dimensional forecasting with known knowns and known unknowns (2401.14582v2)

Published 26 Jan 2024 in econ.EM

Abstract: Forecasts play a central role in decision making under uncertainty. After a brief review of the general issues, this paper considers ways of using high-dimensional data in forecasting. We consider selecting variables from a known active set, known knowns, using Lasso and OCMT, and approximating unobserved latent factors, known unknowns, by various means. This combines both sparse and dense approaches. We demonstrate the various issues involved in variable selection in a high-dimensional setting with an application to forecasting UK inflation at different horizons over the period 2020q1-2023q1. This application shows both the power of parsimonious models and the importance of allowing for global variables.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. Bergmeir, C., R. J. Hyndman, and B. Koo (2018): “A note on the validity of cross-validation for evaluating autoregressive time series prediction,” Computational Statistics & Data Analysis, 120, 70–83.
  2. Bernanke, B. S., J. Boivin, and P. Eliasz (2005): “Measuring the effects of monetary policy: a factor-augmented vector autoregressive (FAVAR) approach,” The Quarterly journal of economics, 120, 387–422.
  3. Chudik, A., V. Grossman, and M. H. Pesaran (2016): “A multi-country approach to forecasting output growth using PMIs,” Journal of Econometrics, 192, 349–365.
  4. Chudik, A., G. Kapetanios, and M. H. Pesaran (2018): “A one covariate at a time, multiple testing approach to variable selection in high-dimensional linear regression models,” Econometrica, 86, 1479–1512.
  5. Chudik, A., and M. H. Pesaran (2016): “Theory and practice of GVAR modelling,” Journal of Economic Surveys, 30, 165–197.
  6. Chudik, A., M. H. Pesaran, and M. Sharifvaghefi (2023): “Variable selection in high dimensional linear regressions with parameter instability,” arXiv:2312.15494 [econ.EM] https://arxiv.org/abs/2312.15494.
  7. Diebold, F. X., and R. S. Mariano (1995): “Comparing predictive accu racy,” Journal of Business and Economic Statistics, 13, 253–263.
  8. Fan, J., Y. Ke, and K. Wang (2020): “Factor-adjusted regularized model selection,” Journal of Econometrics, 216, 71–85.
  9. Giannone, D., M. Lenza, and G. E. Primiceri (2021): “Economic predictions with big data: The illusion of sparsity,” Econometrica, 89, 2409–2437.
  10. Granger, C. W., and M. H. Pesaran (2000a): “Economic and statistical measures of forecast accuracy,” Journal of Forecasting, 19, 537–560.
  11. Granger, C. W. J., and M. H. Pesaran (2000b): “A decision theoretic approach to forecast evaluation,” in Statistics and Finance: An Interface, pp. 261–278. World Scientific.
  12. Hansen, C., and Y. Liao (2019): “The factor-lasso and k-step bootstrap approach for inference in high-dimensional economic applications,” Econometric Theory, 35, 465–509.
  13. Lahiri, S. N. (2021): “Necessary and sufficient conditions for variable selection consistency of the LASSO in high dimensions,” The Annals of Statistics, 49, 820 – 844.
  14. Marcellino, M., J. H. Stock, and M. W. Watson (2006): “A comparison of direct and iterated multistep AR methods for forecasting macroeconomic time series,” Journal of econometrics, 135, 499–526.
  15. Meinshausen, N., and P. Bühlmann (2006): “Variable selection and high-dimensional graphs with the lasso,” Annals of Statistics, 34, 1436–1462.
  16. Mohaddes, K., and M. Raissi (2024): “Compilation, Revision and Updating of the Global VAR (GVAR) Database, 1979Q2-2023Q3,” Mendeley Data, V1.
  17. Pesaran, M. H., A. Pick, and M. Pranovich (2013): “Optimal forecasts in the presence of structural breaks,” Journal of Econometrics, 177, 134–152.
  18. Pesaran, M. H., A. Pick, and A. Timmermann (2011): “Variable selection, estimation and inference for multi-period forecasting problems,” Journal of Econometrics, 164, 173–187.
  19. Pesaran, M. H., and S. Skouras (2004): “Decision-based methods for forecast evaluation,” in A Companion to Economic Forecasting, ed. by D. F. H. Michael P. Clements, chap. 11, pp. 241–267. Wiley Online Library.
  20. Sharifvaghefi, M. (2023): “Variable selection in linear regressions with many highly correlated covariates,” Available at SSRN: https://ssrn.com/abstract=4159979 or http://dx.doi.org/10.2139/ssrn.4159979.
  21. Shrader, J. G., L. Bakkensen, and D. Lemoine (2023): “Fatal Errors: The Mortality Value of Accurate Weather Forecasts,” Working Paper Series, National Bureau of Economic Research, Number 31361.
  22. Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society Series B: Statistical Methodology, 58, 267–288.
  23. Whittle, P. (1983): Prediction and Regulation by Linear Least-Square Methods. University of Minnesota Press.
  24. Zhao, P., and B. Yu (2006): “On model selection consistency of Lasso,” The Journal of Machine Learning Research, 7, 2541–2563.

Summary

  • The paper introduces a dual-method framework combining Lasso and OCMT to enhance forecasting accuracy in high-dimensional economic data.
  • It effectively integrates sparse and dense modeling techniques to capture both observed predictors and latent factors.
  • Empirical results reveal that incorporating international economic signals notably improves forecast performance and reduces overfitting risks.

High-Dimensional Forecasting with Known Knowns and Known Unknowns

This paper, authored by Pesaran and Smith, addresses the challenges and methodologies of leveraging high-dimensional data in forecasting, with a particular focus on economic applications such as predicting UK inflation. By synthesizing both sparse and dense modeling approaches, it seeks to enhance forecasting accuracy amidst the increasing complexity of global economic factors and data availability.

The paper distinguishes between "known knowns" and "known unknowns" to frame the high-dimensional forecasting problem. In this context, "known knowns" refer to a pre-defined set of variables from which relevant predictors are selected through techniques like Lasso and OCMT, whereas "known unknowns" involve the challenge of approximating latent factors not directly observed but inferred using dense methods such as principal components.

A significant contribution of the paper is its dual methodological introduction of Lasso and OCMT. Lasso employs a penalty technique that facilitates variable selection by shrinking some coefficients to zero, thereby discarding less-informative predictors. Despite its simplicity, Lasso's efficacy depends heavily on tuning parameters and assumptions about correlation structures, as reflected in its reliance on the Irrepresentable Condition (IRC).

Complementarily, OCMT (One Covariate at a time, Multiple Testing) provides an inferential approach to variable selection, advancing beyond IRC limitations by individually testing covariates for significance. OCMT incorporates false discovery rate controls to ensure robustness in the presence of large numbers of potential predictors, outperforming Lasso particularly when multicollinearity is pronounced.

The authors also explore GOCMT, an extension of OCMT, which enhances selection procedures by integrating principal components to account for underlying latent structures, thereby marrying the strengths of sparse and dense techniques.

Empirically, the paper applies these methods to forecast quarterly UK inflation, covering various forecast horizons. The empirical analysis underscores the value of integrating global factors, such as non-UK inflation rates, which substantially affect domestic economic conditions. Across different methods and periods, ARX models, incorporating both UK and foreign inflation, demonstrate superior performance in terms of RMSFE compared to simpler autoregressive models.

The paper highlights the potential pitfalls of overly complex models and variable overfitting that can sometimes occur with Lasso, particularly when conditioned on pre-selected variables. The results emphasize that while advanced machine learning techniques are powerful, their utility may be limited by the particular economic structure and the forecasting context in question.

The findings have practical implications for policymakers, suggesting they accommodate international economic interactions and latent global factors in their decision-support tools. Theoretically, this work calls for a nuanced understanding of model specification in high-dimensional environments, advocating for adaptive approaches that balance the strengths of sparse and dense modeling techniques.

In conclusion, this paper contributes to the field of econometrics by providing a comprehensive framework for high-dimensional forecasting within complex, interconnected economic systems. The fusion of traditional econometric methods with contemporary machine-learning techniques offers invaluable insights for improving forecast accuracy in the volatile global economic landscape. As computational power continues to grow and data becomes more abundant, future research may focus on refining these approaches, particularly in the face of parameter instability and structural economic shifts.