DeepVARwT: Adaptive Shrinkage in VAR Models
- The paper introduces a joint estimation framework that integrates trend and VAR coefficient estimation into a single maximum likelihood optimization to propagate uncertainty.
- DeepVARwT employs an LSTM network to derive both trend components and preliminary VAR parameters, with causality enforced through the Ansley–Kohn transform.
- Empirical results demonstrate that the method reduces bias, mean squared error, and forecast errors compared to traditional two-stage approaches.
A locally adaptive shrinkage technique in time series analysis aims to estimate model components—such as trends or dependence structure—while adaptively regularizing or controlling model complexity over time, typically to accommodate nonstationarity, local regime changes, or persistent uncertainties. The DeepVARwT framework ("Deep Learning for a VAR Model with Trend") realizes this philosophy by leveraging deep recurrent architectures to jointly estimate both deterministic trends and the inter-series dependence structure in a vector autoregressive (VAR) system, with explicit enforcement of causality and maximum likelihood as the estimation backbone (Li et al., 2022).
1. Model Structure: VAR with Deterministic Trend
DeepVARwT models a multivariate time series using the VAR() process with a time-dependent mean: where , are coefficient matrices, and is a deterministic, possibly nonlinear trend.
Traditional approaches isolate trend estimation (e.g., via polynomials or kernel smoothing) and then fit the VAR, but this two-stage procedure neglects the uncertainty in trend estimation, which propagates bias into coefficient inference and forecasting, especially in the latter part of the series. DeepVARwT instead poses a joint estimation problem for both the trend field and VAR parameters , propagating uncertainty across components in a single maximum likelihood optimization (Li et al., 2022).
2. LSTM Architecture for Joint Trend and VAR Parameter Estimation
DeepVARwT employs a Long Short-Term Memory (LSTM) network to process known time-based features, such as , generating at each time step a hidden state using the standard LSTM recursion: The LSTM output is linearly mapped to both the trend component and preliminary VAR parameters:
- with ,
- A set of candidate VAR() coefficient matrices and a Cholesky factor such that .
All these mappings are realized using a final fully connected layer of dimension , partitioned to output the necessary parameters for trend, VAR coefficients, and noise covariance (Li et al., 2022).
3. Likelihood-Based Joint Estimation and Optimization
The Gaussian log-likelihood for observed data under the model is: Parameters are collectively denoted (LSTM weights, trend/VAR mappings, and Cholesky factor). Negative log-likelihood serves as the loss, and model parameters are optimized via back-propagation and AdaGrad. This regime ensures that trend estimation error and VAR parameter uncertainty are coherently estimated and updated at once, yielding statistically efficient inference (Li et al., 2022).
4. Causality Enforcement via Ansley–Kohn Transform
Causality, or stability, of VAR() entails that all roots of lie outside the unit circle. DeepVARwT circumvents challenging direct constraints by mapping preliminary coefficients to partial autocorrelation matrices : The Ansley–Kohn recursion then recovers a set of guaranteed-causal matrices, which are consistently used in likelihood evaluation at each gradient descent step. This results in models that are automatically constrained to yield stable VAR solutions throughout training (Li et al., 2022).
5. Training Protocols and Implementation Details
The model is implemented in PyTorch, with the LSTM (for trend estimation) unrolled across time and the downstream likelihood evaluation integrated as a computational block. Two separate learning rates are used for LSTM/trend weights () and for VAR/Cholesky parameters (). Initialization procedures employ (i) nonlinear least squares fit of for trend/LSTM parameters and (ii) an OLS VAR() for initial raw and , using pre-detrended data.
Typical hyperparameters include a hidden dimension and VAR order . Training iterates for a maximum of updates with a convergence tolerance of . All real-data experiments use sliding-window forecasting: a rolling window of length fits the model, producing -step ahead predictions before advancing the window for the next forecast cycle (Li et al., 2022).
6. Empirical Evaluation: Simulation and Real-Data Studies
Simulation studies use VAR(2)+trend series (, ), where coefficients and noise are derived from empirical US stock-return data and reflects real-world trends via kernel smoothing. With $100$ Monte Carlo replications, trend error (MAD), coefficient bias, variance, and MSE are tabulated.
Findings include:
- DeepVARwT yields trend estimates that closely follow ground truth, particularly near local extrema, outperforming high-order polynomial detrending (VARwT).
- VAR coefficients are estimated with consistently smaller bias and lower total MSE than two-stage approaches (Li et al., 2022).
In real-data studies, DeepVARwT is benchmarked against VARwT (OLS-fitted polynomial trend), DeepAR [Salinas et al., 2020], and DeepState [Rangapuram et al., 2018] in three settings:
- US macroeconomic data (GDP gap, inflation, Fed funds rate),
- global temperature anomalies (Northern/Southern hemispheres, tropics),
- further US macro (inflation, unemployment, T-bill rate).
Metrics for evaluation include Absolute Percentage Error (APE) and Scaled Interval Score (SIS) for 95% predictions. Across all cases, DeepVARwT achieved
- Multi-horizon APE reductions up to versus VARwT,
- Sharper and more accurate predictive intervals than DeepAR and DeepState for key series,
- White and near-Gaussian residuals, stable parameter estimates, and improved forecast sharpness at medium-to-long horizons (Li et al., 2022).
Summary of empirical results:
| Dataset / Series | DeepVARwT Gains vs. Baseline | Metrics Improved |
|---|---|---|
| US macro (GDP, Inflation, Fed Funds) | Up to 50% APE reduction, sharper SIS | APE, SIS |
| Global temperature anomalies | Lowest APE, SIS across all horizons | APE, SIS |
| US macro (Inflation, Unemployment) | Leading APE, SIS for Unemployment, T-bill | APE, SIS |
7. Advantages, Limitations, and Extensions
DeepVARwT achieves several benefits:
- Fully joint trend and VAR coefficient estimation, mitigating underestimation of error due to prior detrending,
- Flexible, non-polynomial trends via the LSTM backbone,
- Direct maximum likelihood estimation for statistical efficiency,
- Rigorous causal enforcement for VAR via the Ansley–Kohn transform.
Known limitations include increased computational cost (relative to OLS-based VARwT), especially for long series or higher-dimensional series ( large), and the assumption of conditional Gaussian residuals.
Potential extensions discussed:
- High-dimensional regularization or structured VAR for scalability,
- Non-Gaussian models via variational methods or copula-based techniques (such as normalizing flows),
- Time-varying VAR coefficients through time-indexed output by the LSTM.
DeepVARwT represents an overview of traditional interpretable models and flexible deep architectures, offering improvements in multi-step forecasting and uncertainty quantification compared to two-stage detrending and other deep learning time-series models that ignore inter-series dependence (Li et al., 2022).