NARXESNs: Nonlinear Auto-Regressive ESNs

Updated 31 January 2026

NARXESNs are recurrent neural networks that combine a fixed reservoir with nonlinear autoregressive terms and exogenous inputs for modeling dynamic systems.
They employ a two-stage training process, using ridge regression for pre-training and loss-adaptive fine-tuning to robustly counteract noise and data scarcity.
Empirical benchmarks demonstrate high FIT values and significant reductions in NMSE, underscoring their effectiveness in system identification and control tasks.

Nonlinear Auto-Regressive with eXogenous inputs Echo State Networks (NARXESNs) are a class of recurrent neural network models that integrate the reservoir computing principle of Echo State Networks (ESNs) with explicit nonlinear autoregressive modeling and exogenous input handling. NARXESNs serve as robust and data-efficient models for nonlinear dynamic system identification, prediction, and control, with applications in optimal control, time series modeling, and real-world system identification under noisy and data-scarce conditions.

1. NARXESN Architecture and Formulation

A NARXESN augments the standard ESN paradigm by combining a high-dimensional, fixed, randomly connected reservoir with explicit nonlinear auto-regressive terms on past outputs and exogenous signals. The most general SISO NARXESN is structured as:

Reservoir state update:

$\chi(k + 1) = \zeta\left(W_\chi \chi(k) + W_\phi \phi(k) + W_z z(k + 1)\right)$

where $\chi(k)\in\mathbb{R}^\nu$ is the reservoir state, $W_\chi\in\mathbb{R}^{\nu\times\nu}$ is the reservoir connectivity, $W_\phi$ maps regressor lags $\phi(k)\in\mathbb{R}^{n_\phi}$ (past output/input values), and $W_z$ scales feedback from the preliminary prediction.

Output update and measurement equation:

$z(k + 1) = W_{\text{out},1}^\top \chi(k) + W_{\text{out},2}^\top \phi(k)$

$y(k) = z(k) + w(k),\qquad |w(k)|\leq\bar{w}$

Here, $z(k)$ denotes the noise-free prediction with $y(k)$ the measurement.

Echo State Property (ESP) is enforced, typically by spectral radius constraint $\rho(W_\chi)<1$ or via Lyapunov criteria.

Feedback from the output or internal reservoir states allows the model to embed a nonlinear functional dependence on histories of both inputs and outputs, achieving the NARX structure. This embedding is implicit when state feedback is used as in (Ehlers et al., 2023), or explicit via lagged regressors as in (Sgadari et al., 24 Jan 2026).

2. Key Training Procedures and Robust Identification

Training of a NARXESN typically follows a two-stage protocol:

Stage 1 (Pre-training/readout fitting):

Fix the random reservoir and optimize readout parameters $(W_{\text{out},1}, W_{\text{out},2})$ by ridge regression, leveraging the linear-in-parameters structure:

$z(k+1) = \theta^\top \psi(k),\qquad \psi(k) = \begin{bmatrix}\chi(k) \ \phi(k)\end{bmatrix}$

Closed-form solution is used for computational efficiency.

Stage 2 (Loss-adaptive fine-tuning and model class selection):

For advanced NARXESN variants, such as in the set-membership approach (Sgadari et al., 24 Jan 2026), robust training incorporates bounded measurement noise explicitly, via the Feasible Parameter Set (FPS):

$\Theta = \left\{\theta \in \Omega : |y(k+1) - \theta^\top \hat\psi(k)| \leq \bar\epsilon + \bar w,\, \forall k\right\}$

Model-class selection is achieved through a greedy-prune heuristic on the regressor set and hyperparameters, guided by a scenario-based set-distance performance index. Optimization over the FPS is tractable by convex relaxation and scenario sampling, mitigating combinatorial explosion and NP-hardness of simulation-based loss minimization.

Physics-informed fine-tuning (Editor’s term):

For models with known underlying ODE/DAE structure, additional regularization is imposed by constraining model outputs to match physical residuals, as in PI-ESNs (Mochiutti et al., 2024):

$\mathcal{F}(y[n],u[n]):= y[n+1] - \left(y[n] + \mathcal{N}(y[n],u[n])\Delta t\right) = 0$

Joint minimization of data and physics-informed residual loss is achieved via a self-adaptive weighting mechanism, where loss weights are learned from data likelihoods.

3. Theoretical Guarantees and Echo State Property

Uniform convergence (Echo State Property) is a central theoretical requirement for NARXESNs. Sufficient conditions are:

For any bounded input sequence, the state update mapping produces a unique trajectory, independently of initial state.
For ESNs with feedback, contractivity must be preserved for the effective reservoir matrix $W + W_{\text{fb}}\frac{\partial r}{\partial x}$ , enforcing $\|W + W_{\text{fb}}\frac{\partial r}{\partial x}\|_2 < a$ , with $a$ the Lipschitz constant of the activation function.

Mathematical guarantee: For almost every parameter instantiation $(W, W_{\text{in}}, u(\cdot), y(\cdot))$ , there exists a feedback weight $W_{\text{fb}}$ that strictly reduces the minimum regression cost compared to zero feedback (Ehlers et al., 2023).

4. Feedback Mechanisms and NARX Representation

NARXESNs leverage feedback—either from output ( $y[k]$ ) or reservoir state ( $x[k]$ )—to enrich system memory and input history modeling.

Direct output feedback is realized with a term $W^\text{fb} y[n]$ in the recurrent dynamics (Mochiutti et al., 2024).
State feedback is formalized as $x(t+1) = f(W x(t) + W_{\text{in}} u(t) + W_{\text{fb}} r(x(t)))$ , with $r(\cdot)$ selecting reservoir components (Ehlers et al., 2023).
Recursive expansion of state updates reveals that the NARXESN architecture inherently models nonlinear dependence on both past inputs and outputs, implementing a discrete NARX form:

$y(t) = F\left(y(t-1), y(t-2), \ldots, u(t-1), u(t-2), \ldots\right) + \epsilon(t)$

The explicit inclusion of lagged input/output terms ( $\phi(k)$ ) as well as feedback enables parsimonious identification of high-order nonlinear temporal dependencies.

5. Model Structure Selection and Set-Membership Identification

Model structure selection for NARXESNs is governed by a combination of:

Choice of lags for input ( $u(k-\ell)$ ) and output ( $z(k-\ell)$ ) in the regressor vector
Reservoir dimension ( $\nu$ ), number of nonlinear units ( $n_{nl}$ ), spectral radius ( $\rho$ ), and scaling of feedback ( $k_z$ )

Set-membership identification explicitly formulates feasible parameter sets under bounded measurement noise, and scenario-based optimization is used to select parameters achieving minimal worst-case simulation error in a data-consistent tube. Greedy addition and pruning of regressors, iterative hyperparameter optimization, and repetition over initializations yield robust, parsimonious models (Sgadari et al., 24 Jan 2026).

6. Benchmark Performance and Computational Properties

Empirical results demonstrate:

Exact recovery of synthetic systems with correct lag structure, reservoir size, and nonlinearity count; e.g., validation FIT of 95.4%.
Superior performance over exhaustive grid-search approaches: for the Wiener–Hammerstein benchmark with $N=4000$ , the optimized NARXESN attained FIT = 92.5%, RMSE = 18.1 mV, outperforming the best grid-search result at FIT = 88.6%.
Feedback augmentation in ESNs provides $30\%-60\%$ reduction in NMSE or classification errors, matching the performance increase of significantly larger reservoirs at negligible additional cost (Ehlers et al., 2023).
For control-oriented applications under limited data, physics-informed NARXESNs reduce overfitting and improve generalization error by up to 92% relative to standard ESNs (Mochiutti et al., 2024).

Computational complexity for the scenario-based set-membership method scales as $O(N\cdot\nu^2)$ per scenario, with $N_s \sim 500$ scenarios and $\nu \leq 15$ efficient for practical applications.

7. Generalization, Extensions, and Practical Impact

The NARXESN framework generalizes to:

Physical systems with known ODE or DAE constraints, via physics-informed regularization, leading to improved extrapolation and robustness under parametric uncertainty (Mochiutti et al., 2024).
Arbitrary reservoir architectures, e.g., continuous-time ESNs or graph-structured ESNs, where differentiability with respect to the readout is retained.
Any scenario where robust model identification is challenged by nonlinearities, autoregressive dependency, limited data, or measurement noise, with performance gains both in simulation accuracy and structural parsimony.

Overall, NARXESNs represent a class of ESNs with explicit nonlinear auto-regression and exogenous inputs, equipped with theoretically grounded training procedures, efficient structure selection, robust identification under uncertainty, and broad applicability in nonlinear control and signal processing (Mochiutti et al., 2024, Sgadari et al., 24 Jan 2026, Ehlers et al., 2023).

Summary Table of Key Features

Aspect	Approach in (Mochiutti et al., 2024)	Approach in (Sgadari et al., 24 Jan 2026)
Feedback Mechanism	Explicit output feedback ( $W^{fb} y[n]$ )	Output/state lags in regressor $\phi(k)$
Robustness	Physics-informed loss; self-adaptive weighting	Set-membership FPS; scenario sampling
Model Selection	Hyperparameter search; grid/Bayesian	Greedy forward-prune; scenario evaluation
Training Objective	Data + physics loss (jointly optimized)	Data-consistent simulation error on tubes
Typical Application	ODE/DAE-driven control and prediction	Parsimonious noisy system identification

Markdown Report Issue Upgrade to Chat

References (3)

Improving the Performance of Echo State Networks Through State Feedback (2023)

A new approach for combined model class selection and parameters learning for auto-regressive neural models (2026)

Physics-Informed Echo State Networks for Modeling Controllable Dynamical Systems (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Nonlinear Auto-Regressive with eXogenous inputs Echo State Networks (NARXESNs).