DeepVARwT: Joint Deep Trend & VAR Estimation

Updated 10 February 2026

DeepVARwT is a framework that jointly estimates deterministic trends and VAR coefficients from multivariate time series, integrating deep learning with classical VAR concepts.
It employs LSTM networks to capture evolving trends while enforcing VAR stability via the Ansley–Kohn transformation, ensuring valid multi-step forecasts.
Empirical studies show superior trend recovery, reduced estimation bias, and improved predictive performance compared to traditional two-stage methods.

DeepVARwT is a deep-learning-based framework for multivariate time series analysis that jointly estimates deterministic trends and vector autoregressive (VAR) dependence structures. It extends the classical stationary VAR( $p$ ) model to accommodate time-varying deterministic means, utilizing a Long Short-Term Memory (LSTM) architecture to estimate trends and VAR parameters simultaneously under a Gaussian likelihood. DeepVARwT enforces stability (causality) of the VAR process through the Ansley–Kohn transformation, ensuring well-behaved multi-step forecasting and valid interpretability of the autoregressive dynamics (Li et al., 2022).

1. Mathematical Formulation of VAR with Trend

Let $y_t$ denote a length- $m$ vector time series whose mean evolves deterministically. The model for $y_t$ under DeepVARwT is: $y_t = \mu_t + A_1 y_{t-1} + \cdots + A_p y_{t-p} + \varepsilon_t,$ or, centering by the trend,

$y_t - \mu_t = A_1 (y_{t-1} - \mu_{t-1}) + \cdots + A_p (y_{t-p} - \mu_{t-p}) + \varepsilon_t,$

where $\mu_t$ is an unknown, time-varying $m$ -vector, $A_1,\ldots,A_p$ are $m \times m$ autoregressive coefficient matrices, and $\varepsilon_t \sim N(0,\Sigma)$ are Gaussian noise innovations.

Whereas classical approaches require pre-detrending ( $\mu_t$ specified a priori, typically as a polynomial or spline, then $\{A_i\}$ fit by OLS), DeepVARwT treats $\mu_t$ , $\{A_i\}$ , and $\Sigma$ as jointly unknown and estimates all parameters in a single maximum likelihood framework.

2. LSTM-based Trend and VAR Parameter Estimation

The DeepVARwT approach parameterizes the trend $\mu_t$ and candidate (unconstrained) VAR parameters using an LSTM network. At each time $t$ , the LSTM consumes an input $x_t$ , typically a vector of monomials and reciprocals in $t$ , mapped to a hidden state $h_t \in \mathbb{R}^r$ . The LSTM update equations are: $\begin{aligned} i_t &= \sigma(W_{xi}\,x_t + W_{hi}\,h_{t-1} + b_i) \ f_t &= \sigma(W_{xf}\,x_t + W_{hf}\,h_{t-1} + b_f) \ o_t &= \sigma(W_{xo}\,x_t + W_{ho}\,h_{t-1} + b_o) \ \tilde c_t &= \tanh(W_{xc}\,x_t + W_{hc}\,h_{t-1} + b_c) \ c_t &= f_t\circ c_{t-1} + i_t\circ\tilde c_t \ h_t &= o_t \circ \tanh(c_t) \end{aligned}$ with $\sigma(\cdot)$ the logistic sigmoid and $\circ$ the Hadamard product.

The trend is mapped linearly from the LSTM's hidden state: $\mu_t = W_\mu h_t + b_\mu,\quad W_\mu \in \mathbb{R}^{m \times r},\; b_\mu \in \mathbb{R}^m.$

In parallel, DeepVARwT includes trainable parameters $\{\tilde A_i\}$ (raw, unconstrained $m \times m$ matrices for each lag) and a lower-triangular matrix $L$ such that $\Sigma = L L'$ . To guarantee causality, $\{\tilde A_i\}$ are transformed into stable $\{A_i\}$ at every iteration.

3. Likelihood-based Training and Stable VAR Enforcement

Training proceeds via direct optimization of the exact Gaussian log-likelihood: $\begin{aligned} \ell(\theta) =\;& \tfrac{1}{2}\left[ \log\det R_p + (y_{1:p} - \mu_{1:p})' R_p^{-1} (y_{1:p} - \mu_{1:p}) \right] \ &+ \tfrac{T-p}{2} \log\det \Sigma + \tfrac{1}{2} \sum_{t=p+1}^T \varepsilon_t(\theta)' \Sigma^{-1} \varepsilon_t(\theta) \end{aligned}$ where $R_p$ is the marginal covariance of the first block, and $\varepsilon_t(\theta) = y_t - \mu_t - \sum_{i} A_i (y_{t-i} - \mu_{t-i})$ .

All model parameters are updated by backpropagation through time, implemented in PyTorch with AdaGrad optimizer maintaining per-parameter adaptive learning rates.

Stability of the estimated VAR process is guaranteed by the Ansley–Kohn transformation: unconstrained $\tilde A_j$ are mapped to partial autocorrelation matrices $P_j$ by Cholesky normalization, and then recursively to stable, genuine VAR( $p$ ) parameters $A_i$ , ensuring all roots of $\det(I-\sum A_i z^i)$ are outside the unit disk.

4. Implementation Details and Training Protocol

Training and inference are performed as follows:

PyTorch 1.x is used for network definition and optimization.
Initial values for the trend are obtained by fitting a global OLS polynomial, and detrended data are used to fit an initial VAR by OLS, seeding $\{\tilde A_i\}$ and $L$ .
Configurations commonly used: single LSTM layer with 10–20 hidden units; inputs $x_t$ are monomials and reciprocals in $t$ , range-scaled.
Separate learning rates for trend/LSTM ( $5\times 10^{-4}$ ) and VAR blocks ( $10^{-2}$ ); training is run 300–600 epochs or until log-likelihood improvement falls below $10^{-5}$ .
For forecast evaluation, models are refit in rolling windows (e.g., 166 quarters for US macroeconomic series), and $h$ -step forecasts are generated within each window.

5. Simulation Study: Trend and Parameter Recovery

A simulation study investigates estimation accuracy for a semi-parametric VAR(2) with $m=3$ , where the trend components $\mu_t$ are constructed from kernel-smoothing real stock price data and genuine VAR coefficients are specified. With 100 Monte Carlo replicates ( $T=800$ ), DeepVARwT's performance is assessed in comparison to classical two-stage polynomial detrending and OLS-Var fitting, using mean absolute deviation (MAD) for trend accuracy and empirical bias, SD, and MSE for VAR parameter estimation. DeepVARwT achieves markedly lower MAD and substantially reduced bias and MSE in coefficient estimation, indicating effective joint recovery of both trend and VAR parameters (Li et al., 2022).

6. Empirical Evaluations on Real Data

DeepVARwT is benchmarked on several real-world datasets:

US Macroeconomics (GDP-gap, inflation, Fed funds): With a rolling window of 166 quarters and forecast horizons $h=1,\dots,8$ , DeepVARwT demonstrates lower averaged percentage error (APE) and more calibrated, tighter predictive intervals (measured by Scaled Interval Score, SIS) than VARwT(4; 9th-order polynomial trend), DeepAR, and DeepState, particularly for Fed funds forecasts.
Global Temperature Anomalies (Northern/Southern Hemisphere, Tropics): Across 19 sliding windows and horizons $h=1,2,4,6$ , DeepVARwT produces the most accurate multi-step forecasts and narrowest valid prediction intervals, outperforming VARwT (polynomial trend) and state-of-the-art deep learning baselines.
US Macroeconomics II (inflation, unemployment, T-bill rate): Across rolling-window experiments for horizons up to 8 steps, DeepVARwT surpasses VARwT, DeepAR, and DeepState on point and interval forecasts.

7. Advantages, Limitations, and Potential Extensions

DeepVARwT offers several notable properties:

Advantages: Joint, simultaneous estimation of trend and VAR parameters precludes errors induced by pre-detrending, allows highly flexible trend misspecification via LSTM, preserves the interpretability of Gaussian VAR modeling, enables exact MLE fitting, and enforces strict stability constraints.
Limitations: The Gaussian innovation assumption is restrictive for discrete or heavy-tailed data; hyperparameter tuning (choice of LSTM features and dimensionality) is non-trivial; computational complexity scales cubically with $m$ during stability mapping.
Potential Extensions: Non-Gaussian innovations (e.g., Student- $t$ , count models) via likelihood replacement, structured regularization on VAR parameters for high-dimensional series, tensor-structured VAR for multiway data, and the introduction of normalizing flows for more general innovation distributions are plausible areas of extension.

In summary, DeepVARwT synergistically combines flexible deep trend estimation and autoregressive dependence within a rigorously stable, interpretable Gaussian likelihood framework, achieving superior fit and forecast performance in both controlled and applied settings (Li et al., 2022).

Markdown Report Issue Upgrade to Chat

References (1)

DeepVARwT: Deep Learning for a VAR Model with Trend (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DeepVARwT.