Papers
Topics
Authors
Recent
2000 character limit reached

Liquid-S4: Adaptive Sequence Modeling

Updated 4 January 2026
  • Liquid-S4 is a state-space model that integrates a linear liquid time-constant ODE with a diagonal-plus-low-rank (DPLR) decomposition for efficient long-range sequential learning.
  • The model employs input-dependent state transitions and multi-order liquid kernels to enhance adaptive representation and encode sequence correlations.
  • Empirical evaluations show Liquid-S4 consistently outperforms S4 on benchmarks like Long-Range Arena and audio classification while reducing model parameters.

Liquid-S4 is a structural state-space model (SSM) designed for high-performance representation learning on long-range sequential data. It is constructed by integrating a linear liquid time-constant (LTC) ODE with a diagonal plus low-rank decomposition (DPLR) of state transition matrices, leveraging methodology from S4 (Structured State Spaces). Liquid-S4 is characterized by its input-dependent state transitions and kernel structure that encodes similarities and correlations within the sequence, enabling state-of-the-art results across image, text, audio, and medical signal domains (Hasani et al., 2022).

1. Mathematical Structure of Liquid-S4

Liquid-S4 is founded upon a continuous-time linearized LTC ODE, specified for an NN-dimensional hidden state x(t)RNx(t) \in \mathbb{R}^N, scalar input u(t)u(t), and scalar output %%%%3%%%%: dx(t)dt=[A+Bu(t)]x(t)+Bu(t),y(t)=Cx(t)\frac{d\,x(t)}{dt} = \left[ A + B\,u(t) \right] x(t) + B\,u(t), \qquad y(t) = C\,x(t) where ARN×NA \in \mathbb{R}^{N\times N} is the transition matrix, BRN×1B \in \mathbb{R}^{N\times 1} is both the bias and input modulation, and CR1×NC \in \mathbb{R}^{1 \times N} computes the readout.

Discretization via the bilinear (trapezoidal) rule—using step size Δt\Delta t—yields: Aˉ=(IΔt2A)1(I+Δt2A),Bˉ=(IΔt2A)1ΔtB,Cˉ=C\bar{A} = (I - \tfrac{\Delta t}{2}A)^{-1} (I + \tfrac{\Delta t}{2}A), \quad \bar{B} = (I - \tfrac{\Delta t}{2}A)^{-1} \Delta t\,B, \quad \bar{C} = C The resulting discrete-time recurrence for each timestep kk: xk=[Aˉ+Bˉuk]xk1+Bˉuk,yk=Cˉxkx_k = [\bar{A} + \bar{B}\,u_k] x_{k-1} + \bar{B}\,u_k, \qquad y_k = \bar{C}\,x_k

2. Diagonal-Plus-Low-Rank Parameterization

The state transition matrix AA is initialized using the HiPPO-LegS matrix (scaled Legendre measure) and rewritten in the Normal + Low-Rank (NPLR) decomposition: A=VΛVPQ,A=diag(Λ)+PeffQeffTA = V\,\Lambda\,V^* - P\,Q^\top, \qquad A = \text{diag}(\Lambda) + P_{\text{eff}} Q_{\text{eff}}^T with VV unitary, ΛCN\Lambda \in \mathbb{C}^N diagonal (initially left half-plane for stability), P,QRN×rP,Q \in \mathbb{R}^{N\times r} low-rank (r=1r=1 in practice), and Peff=VPP_{\text{eff}} = V^*P, Qeff=VQQ_{\text{eff}} = V^*Q.

This DPLR form ensures computational efficiency, numerical stability, and the flexibility to learn long temporal dependencies. Input dependence is injected strictly via the BuB\,u modulation.

3. Kernel Construction and Correlation Structure

Liquid-S4 introduces a kernel framework comprising both linear and higher-order correlation terms. For uu-independent recurrence, the convolutional kernel is: yk=i=0kCˉAˉkiBˉui=(Kˉu)k,Kˉn=CˉAˉnBˉy_k = \sum_{i=0}^k \bar{C} \bar{A}^{k-i} \bar{B} u_i = (\bar{K} * u)_k, \quad \bar{K}_n = \bar{C} \bar{A}^n \bar{B} Efficient computation via FFT leverages the Cauchy kernel pipeline, offering nearly linear-time complexity in sequence length.

The LTC recurrence introduces additional “liquid” kernels representing auto-correlations: yk0i<jkCˉAˉk1jBˉ2(uiuj)++0i<j<<pkCˉAˉkpBˉp(uiujup)y_k \supset \sum_{0 \leq i < j \leq k} \bar{C} \bar{A}^{k-1-j} \bar{B}^2 (u_i u_j) + \cdots + \sum_{0 \leq i < j < \cdots < p \leq k} \bar{C} \bar{A}^{k-p} \bar{B}^p (u_i u_j \cdots u_p) For each chosen order pp: Kliquid(p)[n]=(Kˉ[N1n])(Bˉp1[N1n]),n=0,,N1K_{\text{liquid}}^{(p)}[n] = (\bar{K}[N-1-n]) \odot (\bar{B}^{p-1}[N-1-n]), \quad n=0,\dots,N-1 Or in PB mode: $K_{\text{liquid}}^{(p)}[n] = C B^p \quad \text{(constant over %%%%19%%%%)}$ Liquid-S4 concatenates the p=1p=1 S4 kernel with higher-order liquid kernels (p=2,,Pp=2,\dots,\mathcal{P}), computed in O~(N+L+PL~)\tilde O(N+L+\mathcal{P}\tilde L) time.

4. Model Architecture and Training Protocols

Liquid-S4 constructs deep sequence models by stacking multiple state space blocks, each comprising the Liquid-S4 kernel convolution, residual connections, feed-forward layers, and pointwise nonlinearities (GeLU, ReLU). The architecture is causal—each output yky_k depends exclusively on past and current inputs u0:ku_{0:k}.

Training employs cross-entropy loss for classification or 2\ell_2 loss for regression. Regularization strategies include weight decay on learnable parameters (Λ,P,Q,B,C)(\Lambda, P, Q, B, C), dropout in feed-forward submodules, and gradient penalties on eigenvalues to enhance stability. Optimization is via Adam/AdamW with learning rates tuned per task, typically smaller for Liquid-S4 compared to S4.

The kernel’s input dependence—modulating the recurrence at each timestep—enables Liquid-S4 to re-weight historical inputs adaptively, yielding improved generalization in non-stationary and highly correlated data regimes.

5. Empirical Evaluation and Performance Benchmarks

Liquid-S4 establishes new state-of-the-art results on multiple long-range sequence modeling benchmarks:

Long-Range Arena (1K–16K sequences)

Method ListOps IMDB AAN CIFAR PathFinder Path-X Average
S4-LegS (reprod.) 59.60 86.82 90.90 88.65 94.20 96.35 86.09
Liquid-S4 (PB, p6p\le6) 62.75 89.02 91.20 89.50 94.80 96.66 87.32

Raw Speech Commands (35 classes, 16 kHz)

Model Params Accuracy
S4-LegS 307 K 96.08%
S4D-Lin 306 K 96.25%
Liquid-S4 224 K 96.78%

BIDMC Vital Signs (RMSE)

Model HR RR SpO₂
S4-LegS 0.332 0.247 0.090
S4D-Inv 0.373 0.254 0.110
Liquid-S4 (best pp) 0.303 0.158 0.066

These results indicate consistent 1–3% gains over S4, with Liquid-S4 frequently achieving higher accuracy and reduced parameterization (e.g., 30% fewer parameters on Speech Commands).

6. Implementation Details and Hyperparameter Choices

Standard settings are:

  • Number of state space blocks (“depth”): 4–9, task-dependent
  • Hidden units per block (HH): 128–512
  • State dimension (NN): 7 (ListOps/IMDB) to 512 (CIFAR)
  • Low-rank factor (rr): $1$
  • Liquid kernel order (P\mathcal{P}): typically 2–4, default at 3
  • FFT-based convolution via PyKeops for memory efficiency
  • Forward pass complexity: O~(N+L+PL~)\tilde O(N + L + \mathcal{P}\tilde L); memory similar to S4 with minor overhead for correlation kernels

7. Interpretations and Relevance

Liquid-S4’s lightweight kernel modulation introduces input-correlation structure into SSMs, directly encoding similarities of input samples during both training and inference. This property facilitates data-dependent adaptive filtering and generalization under long-sequence, highly-correlated, or non-stationary conditions. The empirical improvement at negligible computational or parameter cost suggests further applicability in domains typified by long-range dependencies and dynamic signal correlation.

Liquid-S4 extends S4’s diagonal-plus-low-rank parametrization by input-dependent kernel construction, resulting in a robust and scalable sequence modeling framework with consistent empirical benefits across modalities (Hasani et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Liquid-S4.