Papers
Topics
Authors
Recent
Search
2000 character limit reached

DeepVARwT: Adaptive Shrinkage in VAR Models

Updated 10 February 2026
  • The paper introduces a joint estimation framework that integrates trend and VAR coefficient estimation into a single maximum likelihood optimization to propagate uncertainty.
  • DeepVARwT employs an LSTM network to derive both trend components and preliminary VAR parameters, with causality enforced through the Ansley–Kohn transform.
  • Empirical results demonstrate that the method reduces bias, mean squared error, and forecast errors compared to traditional two-stage approaches.

A locally adaptive shrinkage technique in time series analysis aims to estimate model components—such as trends or dependence structure—while adaptively regularizing or controlling model complexity over time, typically to accommodate nonstationarity, local regime changes, or persistent uncertainties. The DeepVARwT framework ("Deep Learning for a VAR Model with Trend") realizes this philosophy by leveraging deep recurrent architectures to jointly estimate both deterministic trends and the inter-series dependence structure in a vector autoregressive (VAR) system, with explicit enforcement of causality and maximum likelihood as the estimation backbone (Li et al., 2022).

1. Model Structure: VAR with Deterministic Trend

DeepVARwT models a multivariate time series ytRm\mathbf y_t \in \mathbb R^m using the VAR(pp) process with a time-dependent mean: yt=μt+A1yt1++Apytp+εt\mathbf y_t = \bm\mu_t + A_1\,\mathbf y_{t-1} + \cdots + A_p\,\mathbf y_{t-p} + \bm\varepsilon_t where εtiidN(0,Σ)\bm\varepsilon_t \stackrel{iid}{\sim} N(\mathbf 0, \Sigma), AiRm×mA_i \in \mathbb R^{m \times m} are coefficient matrices, and μt\bm\mu_t is a deterministic, possibly nonlinear trend.

Traditional approaches isolate trend estimation (e.g., via polynomials or kernel smoothing) and then fit the VAR, but this two-stage procedure neglects the uncertainty in trend estimation, which propagates bias into coefficient inference and forecasting, especially in the latter part of the series. DeepVARwT instead poses a joint estimation problem for both the trend field μt\bm\mu_t and VAR parameters (Ai,Σ)(A_i, \Sigma), propagating uncertainty across components in a single maximum likelihood optimization (Li et al., 2022).

2. LSTM Architecture for Joint Trend and VAR Parameter Estimation

DeepVARwT employs a Long Short-Term Memory (LSTM) network to process known time-based features, such as xt=(t,t2,t3,1/t,1/t2,1/t3)\mathbf x_t = (t, t^2, t^3, 1/t, 1/t^2, 1/t^3)^\top, generating at each time step tt a hidden state htRd\mathbf h_t \in \mathbb R^d using the standard LSTM recursion: it=σ(Wxixt+Whiht1+bi) ft=σ(Wxfxt+Whfht1+bf) c~t=tanh(Wxcxt+Whcht1+bc) ct=ftct1+itc~t ot=σ(Wxoxt+Whoht1+bo) ht=ottanh(ct)\begin{aligned} \mathbf i_t &=\sigma(W_{xi}\,\mathbf x_t +W_{hi}\,\mathbf h_{t-1}+b_i) \ \mathbf f_t &=\sigma(W_{xf}\,\mathbf x_t +W_{hf}\,\mathbf h_{t-1}+b_f) \ \tilde{\mathbf c}_t &=\tanh(W_{xc}\,\mathbf x_t +W_{hc}\,\mathbf h_{t-1}+b_c) \ \mathbf c_t &=\mathbf f_t\odot \mathbf c_{t-1} + \mathbf i_t\odot \tilde{\mathbf c}_t \ \mathbf o_t &=\sigma(W_{xo}\,\mathbf x_t +W_{ho}\,\mathbf h_{t-1}+b_o) \ \mathbf h_t &=\mathbf o_t\odot \tanh(\mathbf c_t) \end{aligned} The LSTM output is linearly mapped to both the trend component and preliminary VAR parameters:

  • μt=Wμht+bμ\bm\mu_t = W_\mu \mathbf h_t + \mathbf b_\mu with WμRm×dW_\mu \in \mathbb R^{m \times d}, bμRm\mathbf b_\mu \in \mathbb R^m
  • A set of candidate VAR(pp) coefficient matrices {A~i}\{\widetilde A_i\} and a Cholesky factor LL such that Σ=LL\Sigma = LL^\top.

All these mappings are realized using a final fully connected layer of dimension m2p+m(m+1)/2m^2p + m(m+1)/2, partitioned to output the necessary parameters for trend, VAR coefficients, and noise covariance (Li et al., 2022).

3. Likelihood-Based Joint Estimation and Optimization

The Gaussian log-likelihood for observed data {yt}t=1T\{\mathbf y_t\}_{t=1}^T under the model is: (θ)=12[(Tp)logΣ+t=p+1T(ytμti=1pAi(ytiμti))Σ1()]\ell(\theta) = -\frac12 \left[ (T-p)\log|\Sigma| + \sum_{t=p+1}^T (\mathbf y_t-\bm\mu_t - \sum_{i=1}^p A_i(\mathbf y_{t-i}-\bm\mu_{t-i}))' \Sigma^{-1} (\cdots) \right] Parameters are collectively denoted θ\theta (LSTM weights, trend/VAR mappings, and Cholesky factor). Negative log-likelihood L(θ)=(θ)\mathcal L(\theta) = -\ell(\theta) serves as the loss, and model parameters are optimized via back-propagation and AdaGrad. This regime ensures that trend estimation error and VAR parameter uncertainty are coherently estimated and updated at once, yielding statistically efficient inference (Li et al., 2022).

4. Causality Enforcement via Ansley–Kohn Transform

Causality, or stability, of VAR(pp) entails that all roots of det(IA1zApzp)\det(I - A_1 z - \cdots - A_p z^p) lie outside the unit circle. DeepVARwT circumvents challenging direct constraints by mapping preliminary coefficients {A~i}\{\widetilde A_i\} to partial autocorrelation matrices {Pi}\{P_i\}: Pi=Bi1A~i,BiBi=I+A~iA~iP_i = B_i^{-1} \widetilde{A}_i, \quad B_i B_i^\top = I + \widetilde{A}_i \widetilde{A}_i^\top The Ansley–Kohn recursion then recovers a set of guaranteed-causal AiA_i matrices, which are consistently used in likelihood evaluation at each gradient descent step. This results in models that are automatically constrained to yield stable VAR solutions throughout training (Li et al., 2022).

5. Training Protocols and Implementation Details

The model is implemented in PyTorch, with the LSTM (for trend estimation) unrolled across time and the downstream likelihood evaluation integrated as a computational block. Two separate learning rates are used for LSTM/trend weights (η1103\eta_1 \approx 10^{-3}) and for VAR/Cholesky parameters (η2102\eta_2 \approx 10^{-2}). Initialization procedures employ (i) nonlinear least squares fit of ytμt\mathbf y_t \approx \bm\mu_t for trend/LSTM parameters and (ii) an OLS VAR(pp) for initial raw A~i\widetilde{A}_i and LL, using pre-detrended data.

Typical hyperparameters include a hidden dimension d=20d=20 and VAR order p{2,3,4}p\in\{2,3,4\}. Training iterates for a maximum of K500K\approx 500 updates with a convergence tolerance of 10510^{-5}. All real-data experiments use sliding-window forecasting: a rolling window of length TtrainT_{\rm train} fits the model, producing hh-step ahead predictions before advancing the window for the next forecast cycle (Li et al., 2022).

6. Empirical Evaluation: Simulation and Real-Data Studies

Simulation studies use VAR(2)+trend series (m=3m=3, T=800T=800), where coefficients and noise are derived from empirical US stock-return data and μt\bm\mu_t reflects real-world trends via kernel smoothing. With $100$ Monte Carlo replications, trend error (MAD), coefficient bias, variance, and MSE are tabulated.

Findings include:

  • DeepVARwT yields trend estimates that closely follow ground truth, particularly near local extrema, outperforming high-order polynomial detrending (VARwT).
  • VAR coefficients are estimated with consistently smaller bias and lower total MSE than two-stage approaches (Li et al., 2022).

In real-data studies, DeepVARwT is benchmarked against VARwT (OLS-fitted polynomial trend), DeepAR [Salinas et al., 2020], and DeepState [Rangapuram et al., 2018] in three settings:

  • US macroeconomic data (GDP gap, inflation, Fed funds rate),
  • global temperature anomalies (Northern/Southern hemispheres, tropics),
  • further US macro (inflation, unemployment, T-bill rate).

Metrics for evaluation include Absolute Percentage Error (APE) and Scaled Interval Score (SIS) for 95% predictions. Across all cases, DeepVARwT achieved

  • Multi-horizon APE reductions up to 50%50\% versus VARwT,
  • Sharper and more accurate predictive intervals than DeepAR and DeepState for key series,
  • White and near-Gaussian residuals, stable parameter estimates, and improved forecast sharpness at medium-to-long horizons (Li et al., 2022).

Summary of empirical results:

Dataset / Series DeepVARwT Gains vs. Baseline Metrics Improved
US macro (GDP, Inflation, Fed Funds) Up to 50% APE reduction, sharper SIS APE, SIS
Global temperature anomalies Lowest APE, SIS across all horizons APE, SIS
US macro (Inflation, Unemployment) Leading APE, SIS for Unemployment, T-bill APE, SIS

7. Advantages, Limitations, and Extensions

DeepVARwT achieves several benefits:

  • Fully joint trend and VAR coefficient estimation, mitigating underestimation of error due to prior detrending,
  • Flexible, non-polynomial trends via the LSTM backbone,
  • Direct maximum likelihood estimation for statistical efficiency,
  • Rigorous causal enforcement for VAR via the Ansley–Kohn transform.

Known limitations include increased computational cost (relative to OLS-based VARwT), especially for long series or higher-dimensional series (mm large), and the assumption of conditional Gaussian residuals.

Potential extensions discussed:

  • High-dimensional regularization or structured VAR for scalability,
  • Non-Gaussian models via variational methods or copula-based techniques (such as normalizing flows),
  • Time-varying VAR coefficients through time-indexed Ai(t)A_i(t) output by the LSTM.

DeepVARwT represents an overview of traditional interpretable models and flexible deep architectures, offering improvements in multi-step forecasting and uncertainty quantification compared to two-stage detrending and other deep learning time-series models that ignore inter-series dependence (Li et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Locally Adaptive Shrinkage TATS.