ARX Model: Linear System Identification

Updated 31 December 2025

ARX model is a linear system identification framework that predicts outputs using past values and exogenous inputs.
It employs methods like ordinary least squares and generalized spectral decomposition to estimate model orders and coefficients accurately.
Its applications span traffic forecasting, speech modeling, and adaptive control, with extensions for multivariate and nonlinear data.

An Auto-Regressive Model with Exogenous Inputs (ARX model) is a linear, time-invariant system identification framework widely employed in signal processing, control, econometrics, and engineering modeling of time series with external (exogenous) influences. The ARX model expresses an output signal as a linear combination of its own past values (autoregression) and contemporaneous or lagged exogenous input(s), plus a stochastic disturbance. The archetypal univariate ARX $(n_a, n_b, n_k)$ formulation is

$y(t) + a_1 y(t-1) + \cdots + a_{n_a} y(t-n_a) = b_1 u(t-n_k) + b_2 u(t-n_k-1) + \cdots + b_{n_b} u(t-n_k-n_b+1) + e(t),$

where $y(t)$ is the output, $u(t)$ is the exogenous input, $e(t)$ is additive noise, $n_a$ the AR order, $n_b$ the input order, and $n_k$ the input delay (Maurya et al., 2020). The ARX class includes extensions for multivariate, nonlinear, latent-variable, and error-in-variables modeling, each with bespoke estimation strategies and theoretical guarantees.

1. Mathematical Structure and Problem Formulation

The ARX model is defined by its difference equation or polynomial form in the backward shift operator $q^{-1}$ :

$A(q^{-1})y(t) = B(q^{-1})u(t) + e(t),$

with

$A(q^{-1}) = 1 + a_1 q^{-1} + \cdots + a_{n_a} q^{-n_a}, \quad B(q^{-1}) = b_1 q^{-n_k} + \cdots + b_{n_b} q^{-(n_k+n_b-1)}.$

Here, $e(t)$ is white Gaussian noise, but, after filtering by $A(q^{-1})$ , the output noise becomes colored (Maurya et al., 2020). For multivariate time series (VARX), coefficient matrices and vector-valued exogenous inputs generalize the scalar formulation (Parra et al., 2024).

Data stacking involves forming lagged vectors

$z_L[k] = [y[k], y[k-1], ..., y[k-L], u[k], u[k-1], ..., u[k-L]]^\top,$

then aggregating into a data matrix $Z_L$ . Determining model order, delay, and coefficients is central to ARX identification.

2. Model Identification and Estimation Algorithms

Classical Estimation

Traditional methods iterate over candidate model orders and delays, fit parameters by prediction-error minimization (PEM), ordinary least squares (OLS), or maximum likelihood, and deploy information criteria (AIC/BIC) for order selection. These require user-specified structure and may become computationally expensive (Maurya et al., 2020, Parra et al., 2024).

Generalized Spectral Decomposition (GSD)

The GSD method frames ARX identification as a generalized eigenvalue problem:

$S_{Z_L} V = \Sigma_{eL} V \Lambda,$

where $S_{Z_L}$ is the sample covariance, $\Sigma_{eL}$ the noise covariance reflecting colored output noise, and $\Lambda$ the generalized eigenvalues. The count of eigenvalues near one ( $\lambda \approx 1$ ) determines the number of independent constraints, thereby revealing the model order via $\eta = L - d + 1$ (Maurya et al., 2020). The associated eigenvector yields the normalized parameter vector $[1, a_1, ..., a_{n_a}, -b_{n_k}, ..., -b_{n_k+n_b-1}]^\top$ .

This approach automates order and delay estimation, coefficient extraction, and output noise spectrum identification in a unified, non-iterative manner. It retains statistical consistency as sample size increases, with robust performance at low SNR (standard errors of 0.02–0.04 for moderate SNR) (Maurya et al., 2020).

Distributed Estimation

Distributed ARX estimation for sensor networks leverages a local information criterion (LIC) and diffusion recursive least squares (RLS) (Gan et al., 2021). Each sensor computes prediction errors and regularization penalties; cooperation enables network-wide identifiability under the cooperative excitation condition, guaranteeing strong consistency even when individual nodes cannot fully excite the system. Theoretical analysis applies martingale limit theorems and stochastic Lyapunov arguments.

Empirical Bayes Approach

Empirical Bayes ARX estimation models the parameter vector $\theta$ as a Gaussian prior with hyperparameters, then maximizes the marginal likelihood of the observed data:

$L(\psi) = \ln p(Y | \psi)$

Closed-form gradients allow numerical optimization. Sequential Bayesian procedures (backward Kalman-like filtering) stably estimate prior covariance from finite samples. Marginal Bayes (fit prior, take mean) often outperforms full EB (posterior mean) when true prior variance is small or sample size limited (Leahu et al., 19 May 2025).

Errors-in-Variables (EIV) ARX Identification

Measurement noise in both inputs and outputs is addressed by Modified Dynamic Iterative PCA (DIPCA) (Maurya et al., 2020). The algorithm pre-whitens stacked lagged data using the estimated noise covariance, then identifies the model via eigenvalue decomposition, co-determining order, delay, error variances, and parameters. Simulation studies demonstrate unbiased estimation for moderate SNR and colored noise.

3. Extensions and Modeling Frameworks

Functional and Nonlinear ARX/NARX

Feature-centric approaches such as Functional NARX (F-NARX) replace discrete lag selection with time-windowed feature extraction via PCA, then fit polynomial, nonlinear, or sparse models using least angle regression optimizing forecast error (Schär et al., 2024). Manifold-NARX (mNARX+) further introduces automatic auxiliary quantity selection, recursively identifying critical temporal features by their correlation with prediction residuals (Schär et al., 17 Jul 2025). These extensions yield parsimonious, stable surrogate models, reducing reliance on domain expertise for lag and auxiliary selection.

Latent Variable ARX (LARX)

Latent variable methodologies (C)LARX extend ARX by constructing outputs and exogenous inputs as latent composites of proxy variables (e.g., sectoral returns for economic forecasting) (Bargman, 4 Jun 2025). Blockwise Kronecker and direct-sum operators facilitate constrained least squares estimation and fixed-point iteration over latent factors and coefficients. Empirical studies show substantial forecast improvements with joint extraction of latent sectoral and expenditure weights.

Minimal State-Space Realization

A theorem of Kalman provides a minimal AR state-space realization for ARX models with invertible transfer functions (Nguyen, 2019). Each VARX model admits a representation

$y_t = \sum_{i=1}^p H F^{i-1} G x_{t-i} + \epsilon_t$

with nilpotent Jordan matrix $F$ , and $H, G$ estimated by least squares on a constrained search space. The corresponding likelihood is a generalized Rayleigh quotient, invariant under simultaneous transformations commuting with $F$ , and serves as a multi-lag canonical correlation analysis.

4. Practical Applications

ARX models are foundational for prediction and control in domains such as:

Traffic flow prediction: Linear, polynomial, and neural ARX frameworks integrate exogenous meteorological and spatial features via forward-selection and OLS, ridge, or lasso regularization. Neural ARX (SRNN/LSTM) achieves the best trade-off between accuracy and robust multi-step prediction, with performance varying according to memory structure and domain (Ying et al., 2024).
Speech modeling: ARX models (and ARMAX extensions) parameterize the linear source-filter model of voiced speech, deriving glottal source and vocal tract characteristics. DNN mapping to exogenous LF parameters augments ARMAX models with anti-formant (zero) estimation, outperforming traditional analysis-by-synthesis loops (Lia et al., 2024).
Adaptive minimum-variance control: The PIECE algorithm combines exploratory probing inputs, clipped certainty-equivalent feedback, and OLS learning, achieving $O(\log T)$ regret bounds for bounded noise and improved transient performance over classical methods (Singh et al., 2023).

5. Statistical Properties, Robustness, and Theoretical Guarantees

ARX model identification via generalized spectral decomposition is statistically consistent: estimates converge to the true parameters and output noise spectrum as sample size increases (Maurya et al., 2020). Bootstrap studies at SNR as low as 3–5 dB confirm reliable order/delay and parameter recovery. Distributed estimation under cooperative excitation guarantees almost-sure consistency for both order and coefficients across network nodes regardless of individual sensor deficiencies (Gan et al., 2021). Modified DIPCA in error-in-variables settings provides unbiased, low-variance estimates even when measurement noise is colored and neither noise variance nor model order are known a priori (Maurya et al., 2020).

Empirical Bayes analysis reveals conditions under which marginal likelihood-based parameter estimation outperforms full Bayesian posteriors for finite samples and small prior variance, whereas full EB adapts better for larger variance and sample size (Leahu et al., 19 May 2025).

Traditional ARX identification requires manual selection/looping over candidate orders and delays, repeated PEM or state-space subspace algorithm runs, and auxiliary tests for model structure. Contemporary spectral, empirical Bayes, and feature-centric methods automate structure selection, achieve one-shot estimation of orders, delays, coefficients, and output noise spectrums, and retain consistency properties even at low SNR or under colored noise (Maurya et al., 2020, Schär et al., 2024, Schär et al., 17 Jul 2025).

Extensions to ARMAX (incorporating moving-average noise), LARX (latent factor construction), and nonlinear/functional modeling frameworks increase the expressivity and parsimony of the ARX model family, offering stable long-term predictions, improved out-of-sample accuracy, and reduced reliance on expert-driven lag or feature selection (Schär et al., 2024, Bargman, 4 Jun 2025, Schär et al., 17 Jul 2025).

7. Implementation Considerations and Software Ecosystem

Efficient computational implementation of ARX and VARX estimation relies on blockwise data stacking, covariance calculation, and spectral or numerical optimization routines. Practical model selection and validation employ information-theoretic criteria (AIC/BIC), cross-validation, residual whiteness diagnostics, and stability checks, with available codebases in MATLAB, Python, and R (Parra et al., 2024). Robustness for encrypted control exploits observer-based ARX reformulation, bounding truncation error and fitting within computational constraints imposed by homomorphic encryption (Hong et al., 24 Dec 2025).

Open-source toolkits accommodate practical deployment, with frameworks supporting regularization, basis function expansion, Granger-causality testing, and model-order search (Parra et al., 2024). For complex physical systems, feature-centric and surrogate modeling platforms enable scalable, data-driven ARX/NARX construction (Schär et al., 2024, Schär et al., 17 Jul 2025).

References:

(Maurya et al., 2020, Gan et al., 2021, Ying et al., 2024, Lia et al., 2024, Leahu et al., 19 May 2025, Maurya et al., 2020, Parra et al., 2024, Schär et al., 17 Jul 2025, Hong et al., 24 Dec 2025, Bargman, 4 Jun 2025, Singh et al., 2023, Nguyen, 2019, Schär et al., 2024).