DeepVARwT: Joint Deep Trend & VAR Estimation
- DeepVARwT is a framework that jointly estimates deterministic trends and VAR coefficients from multivariate time series, integrating deep learning with classical VAR concepts.
- It employs LSTM networks to capture evolving trends while enforcing VAR stability via the Ansley–Kohn transformation, ensuring valid multi-step forecasts.
- Empirical studies show superior trend recovery, reduced estimation bias, and improved predictive performance compared to traditional two-stage methods.
DeepVARwT is a deep-learning-based framework for multivariate time series analysis that jointly estimates deterministic trends and vector autoregressive (VAR) dependence structures. It extends the classical stationary VAR() model to accommodate time-varying deterministic means, utilizing a Long Short-Term Memory (LSTM) architecture to estimate trends and VAR parameters simultaneously under a Gaussian likelihood. DeepVARwT enforces stability (causality) of the VAR process through the Ansley–Kohn transformation, ensuring well-behaved multi-step forecasting and valid interpretability of the autoregressive dynamics (Li et al., 2022).
1. Mathematical Formulation of VAR with Trend
Let denote a length- vector time series whose mean evolves deterministically. The model for under DeepVARwT is: or, centering by the trend,
where is an unknown, time-varying -vector, are autoregressive coefficient matrices, and are Gaussian noise innovations.
Whereas classical approaches require pre-detrending ( specified a priori, typically as a polynomial or spline, then fit by OLS), DeepVARwT treats , , and as jointly unknown and estimates all parameters in a single maximum likelihood framework.
2. LSTM-based Trend and VAR Parameter Estimation
The DeepVARwT approach parameterizes the trend and candidate (unconstrained) VAR parameters using an LSTM network. At each time , the LSTM consumes an input , typically a vector of monomials and reciprocals in , mapped to a hidden state . The LSTM update equations are: with the logistic sigmoid and the Hadamard product.
The trend is mapped linearly from the LSTM's hidden state:
In parallel, DeepVARwT includes trainable parameters (raw, unconstrained matrices for each lag) and a lower-triangular matrix such that . To guarantee causality, are transformed into stable at every iteration.
3. Likelihood-based Training and Stable VAR Enforcement
Training proceeds via direct optimization of the exact Gaussian log-likelihood: where is the marginal covariance of the first block, and .
All model parameters are updated by backpropagation through time, implemented in PyTorch with AdaGrad optimizer maintaining per-parameter adaptive learning rates.
Stability of the estimated VAR process is guaranteed by the Ansley–Kohn transformation: unconstrained are mapped to partial autocorrelation matrices by Cholesky normalization, and then recursively to stable, genuine VAR() parameters , ensuring all roots of are outside the unit disk.
4. Implementation Details and Training Protocol
Training and inference are performed as follows:
- PyTorch 1.x is used for network definition and optimization.
- Initial values for the trend are obtained by fitting a global OLS polynomial, and detrended data are used to fit an initial VAR by OLS, seeding and .
- Configurations commonly used: single LSTM layer with 10–20 hidden units; inputs are monomials and reciprocals in , range-scaled.
- Separate learning rates for trend/LSTM () and VAR blocks (); training is run 300–600 epochs or until log-likelihood improvement falls below .
- For forecast evaluation, models are refit in rolling windows (e.g., 166 quarters for US macroeconomic series), and -step forecasts are generated within each window.
5. Simulation Study: Trend and Parameter Recovery
A simulation study investigates estimation accuracy for a semi-parametric VAR(2) with , where the trend components are constructed from kernel-smoothing real stock price data and genuine VAR coefficients are specified. With 100 Monte Carlo replicates (), DeepVARwT's performance is assessed in comparison to classical two-stage polynomial detrending and OLS-Var fitting, using mean absolute deviation (MAD) for trend accuracy and empirical bias, SD, and MSE for VAR parameter estimation. DeepVARwT achieves markedly lower MAD and substantially reduced bias and MSE in coefficient estimation, indicating effective joint recovery of both trend and VAR parameters (Li et al., 2022).
6. Empirical Evaluations on Real Data
DeepVARwT is benchmarked on several real-world datasets:
- US Macroeconomics (GDP-gap, inflation, Fed funds): With a rolling window of 166 quarters and forecast horizons , DeepVARwT demonstrates lower averaged percentage error (APE) and more calibrated, tighter predictive intervals (measured by Scaled Interval Score, SIS) than VARwT(4; 9th-order polynomial trend), DeepAR, and DeepState, particularly for Fed funds forecasts.
- Global Temperature Anomalies (Northern/Southern Hemisphere, Tropics): Across 19 sliding windows and horizons , DeepVARwT produces the most accurate multi-step forecasts and narrowest valid prediction intervals, outperforming VARwT (polynomial trend) and state-of-the-art deep learning baselines.
- US Macroeconomics II (inflation, unemployment, T-bill rate): Across rolling-window experiments for horizons up to 8 steps, DeepVARwT surpasses VARwT, DeepAR, and DeepState on point and interval forecasts.
7. Advantages, Limitations, and Potential Extensions
DeepVARwT offers several notable properties:
- Advantages: Joint, simultaneous estimation of trend and VAR parameters precludes errors induced by pre-detrending, allows highly flexible trend misspecification via LSTM, preserves the interpretability of Gaussian VAR modeling, enables exact MLE fitting, and enforces strict stability constraints.
- Limitations: The Gaussian innovation assumption is restrictive for discrete or heavy-tailed data; hyperparameter tuning (choice of LSTM features and dimensionality) is non-trivial; computational complexity scales cubically with during stability mapping.
- Potential Extensions: Non-Gaussian innovations (e.g., Student-, count models) via likelihood replacement, structured regularization on VAR parameters for high-dimensional series, tensor-structured VAR for multiway data, and the introduction of normalizing flows for more general innovation distributions are plausible areas of extension.
In summary, DeepVARwT synergistically combines flexible deep trend estimation and autoregressive dependence within a rigorously stable, interpretable Gaussian likelihood framework, achieving superior fit and forecast performance in both controlled and applied settings (Li et al., 2022).