Matrix & Tensor Autoregression
- Matrix and tensor autoregression are statistical models that capture temporal dependencies in multi-dimensional array data while preserving mode-specific structures.
- They employ innovations like low-rank tensor decompositions and Kronecker product structures to reduce parameters and enhance interpretability in dynamic settings.
- These models facilitate advanced analysis including regime-switching and nonlinear dynamics, with applications ranging from economics to neuroscience and spatio-temporal forecasting.
Matrix and tensor autoregression encompasses a family of statistical and machine learning models for capturing the temporal dependence structure in time series whose observations are inherently multi-dimensional arrays—matrices (order-2 tensors) or higher-order tensors—evolving over time. Unlike classical vector autoregressive (VAR) models that flatten array-valued data, matrix and tensor autoregression leverages mode-specific structure to achieve interpretability, parameter parsimony, and statistical efficiency, while allowing for increasingly sophisticated dynamic behaviors such as regime shifts, nonlinearities, and nonstationarity.
1. Model Classes: Linear, Nonlinear, and Regime-Switching Autoregressions
Matrix autoregression (MAR) models represent the foundational class, positing bilinear autoregressive dynamics for an observed sequence of matrices : where and are coefficient matrices for row-wise and column-wise dependencies, and is typically assumed to be matrix-variate white noise (potentially with Kronecker-structured covariance) (Chen et al., 2018). MAR reduces parameters from (stacked VAR) to , and supports interpretable separation of row (e.g., country) and column (e.g., economic indicator) effects.
Regime-switching and nonlinear dependencies can be incorporated either via smooth transition functions (yielding models such as Matrix Smooth Transition Autoregression, MSTAR) or finite mixtures (Mixture Matrix Autoregressive, MMAR). For example, MSTAR augments the linear MAR as
with a bounded logistic transition function evolving smoothly in (an observable regime marker), controlling the influence of alternative coefficient matrices for different regimes (Bucci, 2022). MMAR, instead, models the conditional distribution as a mixture of MAR components, each with its own dynamics and covariance structure. The model probabilistically assigns regimes using latent variables and is estimated by a tailored EM algorithm (Wu et al., 2023).
For higher-order data, generalizations include bilinear tensor autoregressions: where , denotes mode- tensor-matrix multiplication, and are mode-specific coefficient matrices, preserving the multiway structure and interpreting each mode's temporal effect (Li et al., 2021).
2. Parameterization and Low-rank Tensor Structure
A major innovation in matrix and tensor autoregression is the use of structured low-rank parameterizations for the autoregressive operator. For matrices, is itself a Kronecker product structure, drastically reducing parameters.
For tensor data, parameter explosion is addressed using various tensor decompositions:
- Tucker decomposition: The coefficient tensor is decomposed as , with core and mode-specific factors , enabling flexible low-rank structure and interpretable factor dynamics in each mode (Wang et al., 2021). The recent Sum of Square-matrix-Nuclear-Norms (SSN) estimator encourages low-rankness across all balanced matricizations, yielding optimal estimation in high-dimensional regimes.
- CANDECOMP/PARAFAC (CP) decomposition: The coefficient tensor is represented as a sum of rank-1 components (outer products), offering unique and interpretable mode-mode interactions, as in
where are length- factor vectors (Li et al., 7 Jun 2025). CP-based models naturally extend MAR and provide mode-specific sparsity and interpretability.
- Tensor train (TT) decomposition: The coefficient tensor is expressed sequentially as products of low-rank cores, making high-order tensor autoregression feasible even as order grows large (Si et al., 2022).
Model estimation exploits alternating minimization or Riemannian manifold optimization, and rank selection is performed by ridge-type ratio estimators, BIC-type criteria, or information-theoretic methods.
3. Statistical and Theoretical Properties
For stationary models, stationarity conditions are determined by the spectral radii of the coefficient matrices or product of mode-specific factors, for example, for MAR and suitable analogues in the multilinear setting (Chen et al., 2018, Li et al., 2021). Asymptotic normality and consistency of parameter estimators under fixed dimension have been rigorously established for projection, least squares, and MLE procedures (Li et al., 2021).
High-dimensional non-asymptotic theory focuses on finite-sample error bounds, showing estimation error rates that scale with model complexity (ranks, sparsity) rather than the ambient data dimension (Wang et al., 2021, Li et al., 7 Jun 2025). SSN-regularized estimators and sparse CP approaches achieve rate-optimal estimation and predictor selection even in large-scale time series.
For regime-switching models, identification, stationarity, and ergodicity are characterized by Lyapunov exponents and probabilistic mixing across MAR components (Wu et al., 2023, Bucci, 2022). Super-consistency and Brownian-motion-based limiting laws are derived for cointegrated models (Li et al., 17 Sep 2024).
4. Estimation Algorithms and Computation
Parameter estimation leverages the structure imposed by tensor decompositions:
- Alternating least squares/blockwise optimization: Update one tensor factor at a time, conditional on the rest, leveraging closed-form or convex updates (Li et al., 2021, Wang et al., 2021).
- EM algorithms: Used for mixture models (MMAR), iteratively updating regime responsibilities and MAR parameters (Wu et al., 2023).
- Riemannian gradient descent/optimization on manifolds: Particularly for TT and orthogonality-constrained Tucker estimators, geometric approaches yield scalable and provably convergent algorithms (Si et al., 2022).
- Regularized convex optimization: Nuclear norm, norm, and their structured variants enable sparsity and low-rank selection through proximal or ADMM algorithms, with strong finite-sample guarantees (Wang et al., 2019, Ghosh et al., 2 Jun 2025).
- Bayesian inference: Hierarchical shrinkage priors and Gibbs sampling for PARAFAC/Bayesian tensor regression (Billio et al., 2017).
Model selection (autoreg order, ranks, regime number) is based on extended BIC/GIC variants or penalized likelihood; advanced model assessment in mixture and time-varying parameter settings uses DIC and knee point detection (Luo et al., 12 May 2025, Wu et al., 2023).
5. Applications and Empirical Results
Matrix and tensor autoregression have been successfully deployed in diverse fields:
- Economics/Finance: Modeling cross-country panel data, global trade networks, yield curves, and cointegrated portfolio data. MAR and its regime-switching extensions (MSTAR, MMAR) capture nonlinear economic cycles, crises, and regime shifts, clearly outperforming vectorized VAR and univariate AR/ARX models in out-of-sample forecasting and structural interpretation (Bucci, 2022, Wu et al., 2023, Li et al., 17 Sep 2024).
- Spatio-Temporal Modeling: MARAC and its tensor generalizations efficiently combine matrix-valued spatio-temporal grid data with vector covariates, incorporating spatial smoothness via RKHS (Sun et al., 2023).
- Neuroscience/Brain Imaging: High-dimensional tensor autoregression and time-varying parameter models (via TT, CP) model large fMRI arrays, infer directional and time-varying Granger causal connectivity with dramatic parameter reduction, and reveal interpretable neural dynamic patterns (Luo et al., 12 May 2025, Si et al., 2022).
- Networks and Social Data: GMNAR captures dynamic network effects and latent group heterogeneity in high-dimensional matrix sequences, enabling parsimonious parameterization and meaningful dynamic clustering (Ren et al., 2022).
- Environmental and Geophysical Data: Tensor autoregression with low-rank structure has enabled interpretable forecasting of global climate, sea surface temperature, and pollution matrices, extracting temporal and spatial modes (Harris et al., 2019, Li et al., 2021).
- Panel/Tensor-valued Cointegration: The CMAR framework generalizes Johansen VECM to matrices, supporting bilinear cointegrating vectors for pairs trading and macro panel analysis (Li et al., 17 Sep 2024).
6. Extensions and Recent Developments
Several directions expand the scope and efficacy of matrix/tensor autoregression:
- Nonlinear/Functional Extensions: Nonparametric transition functions, additive/interacting effects, and RKHS-regularized auxiliary tensor regressions generalize the class to nonlinear, exogenous-driven, and spatial-temporal matrix dynamics (Sun et al., 2023, Ghosh et al., 2 Jun 2025).
- Time-varying Parameter Models: TVP-Tensor VAR models embed the time-variation of parameters directly within the tensor decomposition, enabling dynamic inference for high-dimensional systems; model configuration and rank selection is guided by DIC-based "kneed" selection (Luo et al., 12 May 2025).
- Tensor Network Models: AMPS and related architectures from physics combine matrix product states (MPS) with autoregressive modeling for highly expressive unsupervised generative models with tractable sampling and normalization, competitive with deep neural estimators (Liu et al., 2021).
- Hybrid and Factor-Augmented Variants: Models combining MAR with dynamic matrix factors (MARCF) bridge common and specific predictor/response subspaces, achieving further dimension reduction and improved interpretability (Fan et al., 7 Mar 2025).
These models collectively provide a comprehensive, theoretically rigorous, and empirically validated framework for the analysis and forecasting of high-dimensional, structured time series. Their adoption enables researchers to encode prior knowledge of array structures, accommodate complex dynamic phenomena including regime-changing behavior, and achieve statistically efficient and computationally tractable inference even in large-scale data contexts.