Liquid-S4: Adaptive Sequence Modeling

Updated 4 January 2026

Liquid-S4 is a state-space model that integrates a linear liquid time-constant ODE with a diagonal-plus-low-rank (DPLR) decomposition for efficient long-range sequential learning.
The model employs input-dependent state transitions and multi-order liquid kernels to enhance adaptive representation and encode sequence correlations.
Empirical evaluations show Liquid-S4 consistently outperforms S4 on benchmarks like Long-Range Arena and audio classification while reducing model parameters.

Liquid-S4 is a structural state-space model (SSM) designed for high-performance representation learning on long-range sequential data. It is constructed by integrating a linear liquid time-constant (LTC) ODE with a diagonal plus low-rank decomposition (DPLR) of state transition matrices, leveraging methodology from S4 (Structured State Spaces). Liquid-S4 is characterized by its input-dependent state transitions and kernel structure that encodes similarities and correlations within the sequence, enabling state-of-the-art results across image, text, audio, and medical signal domains (Hasani et al., 2022).

1. Mathematical Structure of Liquid-S4

Liquid-S4 is founded upon a continuous-time linearized LTC ODE, specified for an $N$ -dimensional hidden state $x(t) \in \mathbb{R}^N$ , scalar input $u(t)$ , and scalar output %%%%3%%%%: $\frac{d\,x(t)}{dt} = \left[ A + B\,u(t) \right] x(t) + B\,u(t), \qquad y(t) = C\,x(t)$ where $A \in \mathbb{R}^{N\times N}$ is the transition matrix, $B \in \mathbb{R}^{N\times 1}$ is both the bias and input modulation, and $C \in \mathbb{R}^{1 \times N}$ computes the readout.

Discretization via the bilinear (trapezoidal) rule—using step size $\Delta t$ —yields: $\bar{A} = (I - \tfrac{\Delta t}{2}A)^{-1} (I + \tfrac{\Delta t}{2}A), \quad \bar{B} = (I - \tfrac{\Delta t}{2}A)^{-1} \Delta t\,B, \quad \bar{C} = C$ The resulting discrete-time recurrence for each timestep $k$ : $x_k = [\bar{A} + \bar{B}\,u_k] x_{k-1} + \bar{B}\,u_k, \qquad y_k = \bar{C}\,x_k$

2. Diagonal-Plus-Low-Rank Parameterization

The state transition matrix $A$ is initialized using the HiPPO-LegS matrix (scaled Legendre measure) and rewritten in the Normal + Low-Rank (NPLR) decomposition: $A = V\,\Lambda\,V^* - P\,Q^\top, \qquad A = \text{diag}(\Lambda) + P_{\text{eff}} Q_{\text{eff}}^T$ with $V$ unitary, $\Lambda \in \mathbb{C}^N$ diagonal (initially left half-plane for stability), $P,Q \in \mathbb{R}^{N\times r}$ low-rank ( $r=1$ in practice), and $P_{\text{eff}} = V^*P$ , $Q_{\text{eff}} = V^*Q$ .

This DPLR form ensures computational efficiency, numerical stability, and the flexibility to learn long temporal dependencies. Input dependence is injected strictly via the $B\,u$ modulation.

3. Kernel Construction and Correlation Structure

Liquid-S4 introduces a kernel framework comprising both linear and higher-order correlation terms. For $u$ -independent recurrence, the convolutional kernel is: $y_k = \sum_{i=0}^k \bar{C} \bar{A}^{k-i} \bar{B} u_i = (\bar{K} * u)_k, \quad \bar{K}_n = \bar{C} \bar{A}^n \bar{B}$ Efficient computation via FFT leverages the Cauchy kernel pipeline, offering nearly linear-time complexity in sequence length.

The LTC recurrence introduces additional “liquid” kernels representing auto-correlations: $y_k \supset \sum_{0 \leq i < j \leq k} \bar{C} \bar{A}^{k-1-j} \bar{B}^2 (u_i u_j) + \cdots + \sum_{0 \leq i < j < \cdots < p \leq k} \bar{C} \bar{A}^{k-p} \bar{B}^p (u_i u_j \cdots u_p)$ For each chosen order $p$ : $K_{\text{liquid}}^{(p)}[n] = (\bar{K}[N-1-n]) \odot (\bar{B}^{p-1}[N-1-n]), \quad n=0,\dots,N-1$ Or in PB mode: $K_{\text{liquid}}^{(p)}[n] = C B^p \quad \text{(constant over %%%%19%%%%)}$ Liquid-S4 concatenates the $p=1$ S4 kernel with higher-order liquid kernels ( $p=2,\dots,\mathcal{P}$ ), computed in $\tilde O(N+L+\mathcal{P}\tilde L)$ time.

4. Model Architecture and Training Protocols

Liquid-S4 constructs deep sequence models by stacking multiple state space blocks, each comprising the Liquid-S4 kernel convolution, residual connections, feed-forward layers, and pointwise nonlinearities (GeLU, ReLU). The architecture is causal—each output $y_k$ depends exclusively on past and current inputs $u_{0:k}$ .

Training employs cross-entropy loss for classification or $\ell_2$ loss for regression. Regularization strategies include weight decay on learnable parameters $(\Lambda, P, Q, B, C)$ , dropout in feed-forward submodules, and gradient penalties on eigenvalues to enhance stability. Optimization is via Adam/AdamW with learning rates tuned per task, typically smaller for Liquid-S4 compared to S4.

The kernel’s input dependence—modulating the recurrence at each timestep—enables Liquid-S4 to re-weight historical inputs adaptively, yielding improved generalization in non-stationary and highly correlated data regimes.

5. Empirical Evaluation and Performance Benchmarks

Liquid-S4 establishes new state-of-the-art results on multiple long-range sequence modeling benchmarks:

Long-Range Arena (1K–16K sequences)

Method	ListOps	IMDB	AAN	CIFAR	PathFinder	Path-X	Average
S4-LegS (reprod.)	59.60	86.82	90.90	88.65	94.20	96.35	86.09
Liquid-S4 (PB, $p\le6$ )	62.75	89.02	91.20	89.50	94.80	96.66	87.32

Raw Speech Commands (35 classes, 16 kHz)

Model	Params	Accuracy
S4-LegS	307 K	96.08%
S4D-Lin	306 K	96.25%
Liquid-S4	224 K	96.78%

BIDMC Vital Signs (RMSE)

Model	HR	RR	SpO₂
S4-LegS	0.332	0.247	0.090
S4D-Inv	0.373	0.254	0.110
Liquid-S4 (best $p$ )	0.303	0.158	0.066

These results indicate consistent 1–3% gains over S4, with Liquid-S4 frequently achieving higher accuracy and reduced parameterization (e.g., 30% fewer parameters on Speech Commands).

6. Implementation Details and Hyperparameter Choices

Standard settings are:

Number of state space blocks (“depth”): 4–9, task-dependent
Hidden units per block ( $H$ ): 128–512
State dimension ( $N$ ): 7 (ListOps/IMDB) to 512 (CIFAR)
Low-rank factor ( $r$ ): $1$
Liquid kernel order ( $\mathcal{P}$ ): typically 2–4, default at 3
FFT-based convolution via PyKeops for memory efficiency
Forward pass complexity: $\tilde O(N + L + \mathcal{P}\tilde L)$ ; memory similar to S4 with minor overhead for correlation kernels

7. Interpretations and Relevance

Liquid-S4’s lightweight kernel modulation introduces input-correlation structure into SSMs, directly encoding similarities of input samples during both training and inference. This property facilitates data-dependent adaptive filtering and generalization under long-sequence, highly-correlated, or non-stationary conditions. The empirical improvement at negligible computational or parameter cost suggests further applicability in domains typified by long-range dependencies and dynamic signal correlation.

Liquid-S4 extends S4’s diagonal-plus-low-rank parametrization by input-dependent kernel construction, resulting in a robust and scalable sequence modeling framework with consistent empirical benefits across modalities (Hasani et al., 2022).

PDF Markdown Chat (Pro)

References (1)

Liquid Structural State-Space Models (2022)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Liquid-S4.