Papers
Topics
Authors
Recent
Search
2000 character limit reached

DeepVARwT: Joint Deep Trend & VAR Estimation

Updated 10 February 2026
  • DeepVARwT is a framework that jointly estimates deterministic trends and VAR coefficients from multivariate time series, integrating deep learning with classical VAR concepts.
  • It employs LSTM networks to capture evolving trends while enforcing VAR stability via the Ansley–Kohn transformation, ensuring valid multi-step forecasts.
  • Empirical studies show superior trend recovery, reduced estimation bias, and improved predictive performance compared to traditional two-stage methods.

DeepVARwT is a deep-learning-based framework for multivariate time series analysis that jointly estimates deterministic trends and vector autoregressive (VAR) dependence structures. It extends the classical stationary VAR(pp) model to accommodate time-varying deterministic means, utilizing a Long Short-Term Memory (LSTM) architecture to estimate trends and VAR parameters simultaneously under a Gaussian likelihood. DeepVARwT enforces stability (causality) of the VAR process through the Ansley–Kohn transformation, ensuring well-behaved multi-step forecasting and valid interpretability of the autoregressive dynamics (Li et al., 2022).

1. Mathematical Formulation of VAR with Trend

Let yty_t denote a length-mm vector time series whose mean evolves deterministically. The model for yty_t under DeepVARwT is: yt=μt+A1yt1++Apytp+εt,y_t = \mu_t + A_1 y_{t-1} + \cdots + A_p y_{t-p} + \varepsilon_t, or, centering by the trend,

ytμt=A1(yt1μt1)++Ap(ytpμtp)+εt,y_t - \mu_t = A_1 (y_{t-1} - \mu_{t-1}) + \cdots + A_p (y_{t-p} - \mu_{t-p}) + \varepsilon_t,

where μt\mu_t is an unknown, time-varying mm-vector, A1,,ApA_1,\ldots,A_p are m×mm \times m autoregressive coefficient matrices, and εtN(0,Σ)\varepsilon_t \sim N(0,\Sigma) are Gaussian noise innovations.

Whereas classical approaches require pre-detrending (μt\mu_t specified a priori, typically as a polynomial or spline, then {Ai}\{A_i\} fit by OLS), DeepVARwT treats μt\mu_t, {Ai}\{A_i\}, and Σ\Sigma as jointly unknown and estimates all parameters in a single maximum likelihood framework.

2. LSTM-based Trend and VAR Parameter Estimation

The DeepVARwT approach parameterizes the trend μt\mu_t and candidate (unconstrained) VAR parameters using an LSTM network. At each time tt, the LSTM consumes an input xtx_t, typically a vector of monomials and reciprocals in tt, mapped to a hidden state htRrh_t \in \mathbb{R}^r. The LSTM update equations are: it=σ(Wxixt+Whiht1+bi) ft=σ(Wxfxt+Whfht1+bf) ot=σ(Wxoxt+Whoht1+bo) c~t=tanh(Wxcxt+Whcht1+bc) ct=ftct1+itc~t ht=ottanh(ct)\begin{aligned} i_t &= \sigma(W_{xi}\,x_t + W_{hi}\,h_{t-1} + b_i) \ f_t &= \sigma(W_{xf}\,x_t + W_{hf}\,h_{t-1} + b_f) \ o_t &= \sigma(W_{xo}\,x_t + W_{ho}\,h_{t-1} + b_o) \ \tilde c_t &= \tanh(W_{xc}\,x_t + W_{hc}\,h_{t-1} + b_c) \ c_t &= f_t\circ c_{t-1} + i_t\circ\tilde c_t \ h_t &= o_t \circ \tanh(c_t) \end{aligned} with σ()\sigma(\cdot) the logistic sigmoid and \circ the Hadamard product.

The trend is mapped linearly from the LSTM's hidden state: μt=Wμht+bμ,WμRm×r,  bμRm.\mu_t = W_\mu h_t + b_\mu,\quad W_\mu \in \mathbb{R}^{m \times r},\; b_\mu \in \mathbb{R}^m.

In parallel, DeepVARwT includes trainable parameters {A~i}\{\tilde A_i\} (raw, unconstrained m×mm \times m matrices for each lag) and a lower-triangular matrix LL such that Σ=LL\Sigma = L L'. To guarantee causality, {A~i}\{\tilde A_i\} are transformed into stable {Ai}\{A_i\} at every iteration.

3. Likelihood-based Training and Stable VAR Enforcement

Training proceeds via direct optimization of the exact Gaussian log-likelihood: (θ)=  12[logdetRp+(y1:pμ1:p)Rp1(y1:pμ1:p)] +Tp2logdetΣ+12t=p+1Tεt(θ)Σ1εt(θ)\begin{aligned} \ell(\theta) =\;& \tfrac{1}{2}\left[ \log\det R_p + (y_{1:p} - \mu_{1:p})' R_p^{-1} (y_{1:p} - \mu_{1:p}) \right] \ &+ \tfrac{T-p}{2} \log\det \Sigma + \tfrac{1}{2} \sum_{t=p+1}^T \varepsilon_t(\theta)' \Sigma^{-1} \varepsilon_t(\theta) \end{aligned} where RpR_p is the marginal covariance of the first block, and εt(θ)=ytμtiAi(ytiμti)\varepsilon_t(\theta) = y_t - \mu_t - \sum_{i} A_i (y_{t-i} - \mu_{t-i}).

All model parameters are updated by backpropagation through time, implemented in PyTorch with AdaGrad optimizer maintaining per-parameter adaptive learning rates.

Stability of the estimated VAR process is guaranteed by the Ansley–Kohn transformation: unconstrained A~j\tilde A_j are mapped to partial autocorrelation matrices PjP_j by Cholesky normalization, and then recursively to stable, genuine VAR(pp) parameters AiA_i, ensuring all roots of det(IAizi)\det(I-\sum A_i z^i) are outside the unit disk.

4. Implementation Details and Training Protocol

Training and inference are performed as follows:

  • PyTorch 1.x is used for network definition and optimization.
  • Initial values for the trend are obtained by fitting a global OLS polynomial, and detrended data are used to fit an initial VAR by OLS, seeding {A~i}\{\tilde A_i\} and LL.
  • Configurations commonly used: single LSTM layer with 10–20 hidden units; inputs xtx_t are monomials and reciprocals in tt, range-scaled.
  • Separate learning rates for trend/LSTM (5×1045\times 10^{-4}) and VAR blocks (10210^{-2}); training is run 300–600 epochs or until log-likelihood improvement falls below 10510^{-5}.
  • For forecast evaluation, models are refit in rolling windows (e.g., 166 quarters for US macroeconomic series), and hh-step forecasts are generated within each window.

5. Simulation Study: Trend and Parameter Recovery

A simulation study investigates estimation accuracy for a semi-parametric VAR(2) with m=3m=3, where the trend components μt\mu_t are constructed from kernel-smoothing real stock price data and genuine VAR coefficients are specified. With 100 Monte Carlo replicates (T=800T=800), DeepVARwT's performance is assessed in comparison to classical two-stage polynomial detrending and OLS-Var fitting, using mean absolute deviation (MAD) for trend accuracy and empirical bias, SD, and MSE for VAR parameter estimation. DeepVARwT achieves markedly lower MAD and substantially reduced bias and MSE in coefficient estimation, indicating effective joint recovery of both trend and VAR parameters (Li et al., 2022).

6. Empirical Evaluations on Real Data

DeepVARwT is benchmarked on several real-world datasets:

  • US Macroeconomics (GDP-gap, inflation, Fed funds): With a rolling window of 166 quarters and forecast horizons h=1,,8h=1,\dots,8, DeepVARwT demonstrates lower averaged percentage error (APE) and more calibrated, tighter predictive intervals (measured by Scaled Interval Score, SIS) than VARwT(4; 9th-order polynomial trend), DeepAR, and DeepState, particularly for Fed funds forecasts.
  • Global Temperature Anomalies (Northern/Southern Hemisphere, Tropics): Across 19 sliding windows and horizons h=1,2,4,6h=1,2,4,6, DeepVARwT produces the most accurate multi-step forecasts and narrowest valid prediction intervals, outperforming VARwT (polynomial trend) and state-of-the-art deep learning baselines.
  • US Macroeconomics II (inflation, unemployment, T-bill rate): Across rolling-window experiments for horizons up to 8 steps, DeepVARwT surpasses VARwT, DeepAR, and DeepState on point and interval forecasts.

7. Advantages, Limitations, and Potential Extensions

DeepVARwT offers several notable properties:

  • Advantages: Joint, simultaneous estimation of trend and VAR parameters precludes errors induced by pre-detrending, allows highly flexible trend misspecification via LSTM, preserves the interpretability of Gaussian VAR modeling, enables exact MLE fitting, and enforces strict stability constraints.
  • Limitations: The Gaussian innovation assumption is restrictive for discrete or heavy-tailed data; hyperparameter tuning (choice of LSTM features and dimensionality) is non-trivial; computational complexity scales cubically with mm during stability mapping.
  • Potential Extensions: Non-Gaussian innovations (e.g., Student-tt, count models) via likelihood replacement, structured regularization on VAR parameters for high-dimensional series, tensor-structured VAR for multiway data, and the introduction of normalizing flows for more general innovation distributions are plausible areas of extension.

In summary, DeepVARwT synergistically combines flexible deep trend estimation and autoregressive dependence within a rigorously stable, interpretable Gaussian likelihood framework, achieving superior fit and forecast performance in both controlled and applied settings (Li et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DeepVARwT.