First-Order State Space Model

Updated 17 September 2025

The First-Order State Space Model is a dynamic framework with the Markov property, where the next state depends solely on the current state and input.
It employs nonlinear basis function expansions and Gaussian Process regularization to accurately model complex dynamics while mitigating overfitting.
Inference methods like Sequential Monte Carlo and EM-based particle approaches enable robust estimation of latent states and model parameters.

A first-order state space model (FSSM) is a discrete-time or continuous-time mathematical framework in which the future state of a dynamical system depends only on its current state and input, reflecting a Markovian property. The formalism is foundational both in control theory and in modern machine learning for system identification, sequence modeling, and latent process inference. Recent research expands FSSM frameworks to accommodate nonlinear dynamics, high-dimensional function representation, effective regularization, and advanced inference techniques.

1. Mathematical Structure of the First-Order State Space Model

The canonical FSSM is defined by the pair of recursive equations: $\begin{aligned} x_{t+1} &= f(x_t, u_t) + v_t \ y_t &= g(x_t, u_t) + e_t \end{aligned}$ where $x_t$ is the latent state at time $t$ , $u_t$ the controllable input, $v_t$ process noise, $y_t$ the observed output, $g$ the measurement function, and $e_t$ the measurement noise. The essential first-order property is that $x_{t+1}$ depends solely on $x_t$ and $u_t$ , without higher-order temporal dependencies.

Classical linear models set $f(x_t, u_t) = Ax_t + Bu_t$ and $g(x_t, u_t) = Cx_t + Du_t$ . However, first-order models generalize this to arbitrarily complex $f$ and $g$ , potentially highly nonlinear, fitted from data rather than known a priori. This broader model class forms the backbone for highly flexible system identification and forecasting regimes.

2. Basis Function Expansions and Nonlinear System Identification

To address tractability for nonlinear systems, $f$ and $g$ can be expressed as weighted sums of pre-specified basis functions: $f(x) = \sum_{j=0}^{m} w^{(j)} \phi^{(j)}(x)$ where $\{\phi^{(j)}(x)\}$ are basis functions (e.g., Laplacian eigenfunctions, sinusoids, polynomials) and $w^{(j)}$ their weights. This approach retains linearity in the parameters $w$ while allowing $f$ (and analogously $g$ ) to approximate highly nonlinear mappings.

For example, with $\phi^{(j)}(x) = (1/\sqrt{L})\sin\left(\frac{\pi j (x+L)}{2L}\right)$ , the expansion allows a dense function class over the domain $[-L, L]$ . Aggregating weights into a matrix $A$ and stacking basis evaluations into vectors $\Phi(x_t, u_t)$ yields: $\begin{aligned} x_{t+1} &= A \cdot \Phi(x_t, u_t) + v_t \ y_t &= C \cdot \Phi_g(x_t, u_t) + e_t \end{aligned}$ capturing both complex state evolution and measurement mappings.

3. Gaussian Process Regularization and Model Generalization

The primary risk of using rich basis expansions is overfitting, especially when $m$ (the number of basis functions) is large or data are limited. To combat this, the assignment $f(x) \sim \mathrm{GP}(0, \kappa(x, x'))$ ties the weights $w^{(j)}$ to a Gaussian prior whose variance is governed by the spectral density $S(\lambda^{(j)})$ of the kernel $\kappa$ at the eigenvalue $\lambda^{(j)}$ : $w^{(j)} \sim \mathcal{N}(0, S(\lambda^{(j)}))$ For instance, when $\kappa$ is the squared-exponential kernel, $S(\lambda)$ decays with increasing frequency; high-complexity basis functions have their coefficients shrunk toward zero unless data indicate otherwise.

This regularization framework is motivated by the Karhunen–Loève spectral expansion of Gaussian processes, effectively encouraging only the simplest structures to explain data, while preventing overfitting even for overcomplete basis sets. In the optimization perspective, the effect is equivalent to a regularized (penalized) maximum likelihood estimation: $w^* = \arg\min_w \left[ -\log p(y_{1:t} \mid w) + \alpha \|w\|^2 \right]$ where the parameter $\alpha$ is related to the GP prior variance.

4. Parameter Inference via Sequential Monte Carlo

Inference for FSSMs with latent nonlinearities and unknown parameters is challenging due to the intractable integrals arising from unobserved states. To address this, sequential Monte Carlo methods — particularly Particle Gibbs with Ancestor Sampling (PGAS) — are adopted.

The learning protocol alternates between:

Filtering state trajectories $x_{1:t}$ using particle approximations to $p(x_{1:t} \mid y_{1:t}, \theta)$ :

$\widehat{p}(x_t \mid y_{1:t}) = \sum_{i=1}^N \omega_t^{(i)} \delta_{x_t^{(i)}}(x_t)$

Updating parameters (basis weights $A, C$ , noise covariances $Q, R$ ) analytically or via conjugate priors (e.g., MNIW for $(A,Q)$ ) given the sampled state path.

This is embedded in a Markov Chain Monte Carlo (MCMC) routine targeting the posterior $p(\theta, x_{1:t} \mid y_{1:t})$ . For point estimation, a stochastic approximation EM (PSAEM) algorithm is used to overcome non-analytic E-steps through Robbins–Monro updates.

5. Theoretical Guarantees

The FSSM framework with SMC-based learning possesses strong statistical guarantees. The constructed Markov chain for the Bayesian sampler (with PGAS and Metropolis-within-Gibbs updates) admits the true joint posterior $p(\theta, x_{1:t} \mid y_{1:t})$ as its invariant distribution, guaranteeing asymptotic correctness.

For the regularized maximum likelihood with PSAEM, under standard conditions the iterates converge to a stationary point of the penalized objective, not necessarily the global minimum but at least a local maximum, as is typical for EM methods. The GP-based regularization ensures the objective is smooth and well-conditioned, increasing the practical robustness of learning.

6. Practical Implications and Applications

The resulting FSSM formulation, combining nonlinear basis expansions, GP-motivated priors, and SMC-based learning, forms a system identification tool applicable to a broad class of dynamical systems. The main practical implications are:

Flexibility in representing complex, possibly nonparametric system dynamics while still retaining computational tractability.
Systematic regularization to avoid overfitting, with theory and implementation grounded in Gaussian process models.
Scalability to moderately high-dimensional systems, with the critical computational bottleneck residing in the SMC procedure and the number of basis functions.

Typical applications include nonlinear control, signal processing, time-series forecasting where the system structure is unknown or highly nonlinear, and experimental system identification in engineering and the physical sciences.

7. Summary Table: Conceptual Mapping

Aspect	Classical FSSM	Nonlinear FSSM in (Svensson et al., 2016)	Impact
Transition/Observation Map	Linear ( $Ax_t+Bu_t$ / $Cx_t+Du_t$ )	Arbitrary via basis expansion ( $\sum w\phi$ )	Captures complex dynamics
Regularization	Hand-chosen priors or none	GP spectral priors via KL expansion	Controls overfitting, model flexibility
Inference	Analytical (e.g., Kalman filtering)	Sequential Monte Carlo + conjugate updates	Supports latent state and parameter learning
Scalability	Excellent for linear models	Limited by basis function count, SMC particles	Applies to moderately complex systems
Theoretical Guarantees	Riccati equations, optimality	Asymptotic consistency, convergence	Ensures statistical reliability

This approach enables an overview of classical first-order state space principles with modern stochastic process theory and numerical inference, providing a flexible and robust framework for nonlinear dynamical system modeling (Svensson et al., 2016).

PDF Markdown Chat (Pro)

References (1)

A flexible state space model for learning nonlinear dynamical systems (2016)

Follow Topic

Get notified by email when new papers are published related to First-order State Space Model (FSSM).