Linear Oscillatory State-Space Models

Updated 3 December 2025

Linear Oscillatory State-Space Models (LinOSS) are structured state-space models using forced harmonic oscillators to provide an analytically tractable framework for Gaussian process modeling.
They unify Matérn and damped SHO processes via continuous and discrete formulations, enabling efficient long-range sequence learning with precise hyperparameter control.
Variants like LinOSS-IM, IMEX, and D-LinOSS demonstrate competitive performance on tasks such as time-series classification and regression, leveraging fast associative scans for reduced computational complexity.

Linear Oscillatory State-Space Models (LinOSS) represent a class of structured state-space models formed by discrete layers of forced harmonic oscillators. They provide an analytically tractable framework unifying Matérn and damped simple harmonic oscillator (SHO) Gaussian processes under a linear, Gaussian, Markovian state-space representation (Jordán et al., 2021). Recent neural architectures extend LinOSS principles to the efficient modeling of extremely long-range dependencies and to high-capacity sequence learning (Rusch et al., 4 Oct 2024, Boyer et al., 17 May 2025).

1. Continuous-Time Formulation

LinOSS models encode latent Markov processes $X(t)\in\mathbb{R}^d$ driven by white noise, observed via a linear projection $y(t)\in\mathbb{R}$ . In differential notation,

$dX(t) = F X(t) dt + L dW(t), \quad y(t) = H X(t)$

where $F$ is the state matrix, $L$ couples a multidimensional Wiener process $W(t)$ , and $H$ selects observed components. For the SHO, with natural frequency $\omega_0$ and damping factor $\lambda$ , the dynamics are second-order with state dimension $d=2$ ,

$F = \begin{bmatrix} 0 & 1 \ -\omega_0^2 & -2\lambda\omega_0 \end{bmatrix}, \quad L = \begin{bmatrix} 0 \ \sigma \end{bmatrix}, \quad H = [1 \ 0].$

This construction generalizes to Matérn covariances for smoothness parameters $\nu=1/2,3/2,5/2$ , where state dimension $d=\lceil\nu\rceil$ and system matrices follow explicit block constructions for each $\nu$ (Jordán et al., 2021). In all cases, LinOSS provide rational spectral densities with a closed-form correspondence between state-space parameters and kernel hyperparameters. In modern neural sequence models, LinOSS blocks are built from networks of uncoupled forced oscillators (Rusch et al., 4 Oct 2024, Boyer et al., 17 May 2025):

$\ddot x(t) + \Lambda^2 x(t) = B u(t)$

where $\Lambda$ is diagonal, $x(t)$ is the hidden state, and $B$ is the input-to-state coupling.

2. Discretization and Algorithmic Structure

Discrete-time LinOSS recurrences are obtained via exact matrix exponentiation for classic Gaussian process applications (Jordán et al., 2021):

$X_{k+1} = A X_k + w_k, \quad A = \exp(F\Delta t), \quad w_k \sim \mathcal{N}(0,Q),$

with $Q$ defined by an integral Lyapunov equation. For neural networks, structure-preserving implicit-explicit (IMEX) and fully implicit (IM) Euler schemes are applied (Rusch et al., 4 Oct 2024, Boyer et al., 17 May 2025):

IM scheme: $(I-\Delta t A)z_n = z_{n-1} + \Delta t C u_n$ , yielding a linear recurrence

$z_n = G_{IM} z_{n-1} + H_{IM} u_n$

IMEX scheme: updates preserve symplecticity (volume) and time-reversibility with

$x_n = x_{n-1} + \Delta t v_n, \quad v_n = v_{n-1} + \Delta t[-\Lambda^2 x_{n-1} + B u_n],$

and block-wise recurrence $z_n = G_{IMEX} z_{n-1} + H_{IMEX} u_n$ .

Efficient sequence processing is realized via fast associative parallel scans (tree-scan algorithms), which reduce wall-clock depth from $O(N)$ to $O(\log N)$ and allow GPU kernels to scale to sequences of length $N=50,000$ (Rusch et al., 4 Oct 2024).

3. Covariance, Spectral Structure, and Hyperparameterization

Analytically, LinOSS latent covariances $P_\infty$ solve the Lyapunov equation,

$F P_\infty + P_\infty F^\top + LL^\top = 0;$

the output covariance function is

$\mathrm{Cov}[y(t), y(t+\tau)] = H e^{F|\tau|} P_\infty H^\top,$

recovering the Matérn and SHO kernel forms (Jordán et al., 2021).

The spectral density for LinOSS is obtained by Fourier transform:

$S(\omega) = H (i\omega I - F)^{-1} LL^\top (-i\omega I - F^\top)^{-1} H^\top,$

with SHO poles at $-\lambda\omega_0 \pm i\omega_0\sqrt{1-\lambda^2}$ . For Matérn $\nu$ , the density is

$S_{\mathrm{Mat}}(\omega) = \sigma^2 \frac{(2\nu/\ell^2)^\nu}{\Gamma(\nu)} \frac{1}{(2\nu/\ell^2 + \omega^2)^{\nu + 1/2}}$

with repeated poles at $\omega = \pm i\sqrt{2\nu}/\ell$ .

Hyperparameters for LinOSS models include amplitude $\sigma^2$ , characteristic time scale $\ell$ or $\omega_0$ , and damping/smoothness $\nu$ or $\lambda$ . For neural architectures, frequency parameters are constrained for stability via nonnegativity and may be initialized via uniform sampling, regularized, and learned as free variables (Rusch et al., 4 Oct 2024).

4. Universality, Stability, and Extensions

LinOSS blocks are universal function approximators for causal, continuous operators between time-varying inputs and outputs, provably capable of arbitrarily accurate approximation via suitably chosen oscillator frequencies and readout mappings (Rusch et al., 4 Oct 2024). Nonnegativity of oscillator frequencies ensures global asymptotic stability in IM schemes, while IMEX schemes retain symplecticity and invertibility—allowing memory-efficient backpropagation.

However, canonical LinOSS variants (IM, IMEX) couple damping and frequency rigidly: only one dissipation timescale per oscillator, limiting representational flexibility for certain dynamics. To address this, Damped LinOSS (D-LinOSS) models allow independent, learnable per-mode damping parameters (Boyer et al., 17 May 2025). In D-LinOSS, damping $G$ and frequency $A$ are diagonal and unconstrained within the stability region $(G_i-\Delta t\,A_i)^2 \leq 4A_i$ , allowing the full complex unit disk of eigenvalues to be realized. This decoupling enables superior representation of multi-timescale and long-range dependencies.

5. Computational Complexity and Memory

Sequential LinOSS models scale with $O(m^2 N)$ (for oscillator dimension $m$ , sequence length $N$ ). With parallel associative scan algorithms, processing depth is reduced to $O(\log N)$ at $O(m N)$ total flops (Rusch et al., 4 Oct 2024). For Gaussian process inference, likelihood evaluation and Kalman filtering are $O(n d^3)$ , or $O(n d^2)$ for sparse/block-diagonal $F$ (Jordán et al., 2021).

Memory usage benefits from invertibility in IMEX: only checkpointed activations are required for backpropagation, reducing from $O(mN)$ to $O(m)$ plus log-sized checkpoints. IM variants require standard checkpointing or backscan reversal.

6. Empirical Performance and Model Variants

Empirical results across multiple benchmarks demonstrate LinOSS’s competitive and sometimes superior performance versus state-of-the-art sequence models:

UEA MTS Classification: LinOSS-IM achieves 67.8% mean test accuracy (max 95% on 17,984-length sequence), beating Log-NCDE and S5 (Rusch et al., 4 Oct 2024).
PPG-DaLiA Regression: LinOSS-IM MSE $0.064$ ( $\times10^{-2}$ ), outperforming LRU ($0.156$), S5 ($0.128$), Mamba ($0.107$) (Rusch et al., 4 Oct 2024).
Long-Horizon Weather Forecast: LinOSS-IMEX MAE $0.508$, LinOSS-IM $0.528$, S4 $0.578$, outperforming transformer models (Rusch et al., 4 Oct 2024).
D-LinOSS Comparisons: D-LinOSS achieves improved RMSE ( $0.8\times10^{-4}$ vs $8.0\times10^{-4}$ for LinOSS-IM), higher accuracy, and reduced hyperparameter search space, due to decoupled damping (Boyer et al., 17 May 2025).

Architectural variants include LinOSS-IM (dissipative, forgetting), LinOSS-IMEX (volume-preserving), and D-LinOSS (learnable, multi-timescale dissipation). Training employs GLU regularization, residual connections, and time-indexed input. Integration of time-varying layers and domain-specific oscillatory priors remains an open route for further paper.

7. Connections to Gaussian Process Theory

LinOSS models are grounded in state-space representations of Gaussian processes with rational spectral densities, providing closed-form kernels and efficient $O(n)$ inference via standard Kalman algorithms. The framework allows seamless unification of Matérn and damped oscillator processes, continuous and discrete-time models, and supports fast hyperparameter learning by maximizing likelihood or MCMC (Jordán et al., 2021).

The construction elucidates the spectral–temporal relationship of kernel smoothness, impulse response, and damping, and provides closed-form expressions for covariance and spectral densities relevant to time series modeling, astronomical data analysis, and general sequence learning (Jordán et al., 2021). The extension to neural architectures preserves computational efficiency while unlocking new representational capacities essential for modeling long-sequence dynamics.