Liquid-S4: Adaptive Sequence Modeling
- Liquid-S4 is a state-space model that integrates a linear liquid time-constant ODE with a diagonal-plus-low-rank (DPLR) decomposition for efficient long-range sequential learning.
- The model employs input-dependent state transitions and multi-order liquid kernels to enhance adaptive representation and encode sequence correlations.
- Empirical evaluations show Liquid-S4 consistently outperforms S4 on benchmarks like Long-Range Arena and audio classification while reducing model parameters.
Liquid-S4 is a structural state-space model (SSM) designed for high-performance representation learning on long-range sequential data. It is constructed by integrating a linear liquid time-constant (LTC) ODE with a diagonal plus low-rank decomposition (DPLR) of state transition matrices, leveraging methodology from S4 (Structured State Spaces). Liquid-S4 is characterized by its input-dependent state transitions and kernel structure that encodes similarities and correlations within the sequence, enabling state-of-the-art results across image, text, audio, and medical signal domains (Hasani et al., 2022).
1. Mathematical Structure of Liquid-S4
Liquid-S4 is founded upon a continuous-time linearized LTC ODE, specified for an -dimensional hidden state , scalar input , and scalar output %%%%3%%%%: where is the transition matrix, is both the bias and input modulation, and computes the readout.
Discretization via the bilinear (trapezoidal) rule—using step size —yields: The resulting discrete-time recurrence for each timestep :
2. Diagonal-Plus-Low-Rank Parameterization
The state transition matrix is initialized using the HiPPO-LegS matrix (scaled Legendre measure) and rewritten in the Normal + Low-Rank (NPLR) decomposition: with unitary, diagonal (initially left half-plane for stability), low-rank ( in practice), and , .
This DPLR form ensures computational efficiency, numerical stability, and the flexibility to learn long temporal dependencies. Input dependence is injected strictly via the modulation.
3. Kernel Construction and Correlation Structure
Liquid-S4 introduces a kernel framework comprising both linear and higher-order correlation terms. For -independent recurrence, the convolutional kernel is: Efficient computation via FFT leverages the Cauchy kernel pipeline, offering nearly linear-time complexity in sequence length.
The LTC recurrence introduces additional “liquid” kernels representing auto-correlations: For each chosen order : Or in PB mode: $K_{\text{liquid}}^{(p)}[n] = C B^p \quad \text{(constant over %%%%19%%%%)}$ Liquid-S4 concatenates the S4 kernel with higher-order liquid kernels (), computed in time.
4. Model Architecture and Training Protocols
Liquid-S4 constructs deep sequence models by stacking multiple state space blocks, each comprising the Liquid-S4 kernel convolution, residual connections, feed-forward layers, and pointwise nonlinearities (GeLU, ReLU). The architecture is causal—each output depends exclusively on past and current inputs .
Training employs cross-entropy loss for classification or loss for regression. Regularization strategies include weight decay on learnable parameters , dropout in feed-forward submodules, and gradient penalties on eigenvalues to enhance stability. Optimization is via Adam/AdamW with learning rates tuned per task, typically smaller for Liquid-S4 compared to S4.
The kernel’s input dependence—modulating the recurrence at each timestep—enables Liquid-S4 to re-weight historical inputs adaptively, yielding improved generalization in non-stationary and highly correlated data regimes.
5. Empirical Evaluation and Performance Benchmarks
Liquid-S4 establishes new state-of-the-art results on multiple long-range sequence modeling benchmarks:
Long-Range Arena (1K–16K sequences)
| Method | ListOps | IMDB | AAN | CIFAR | PathFinder | Path-X | Average |
|---|---|---|---|---|---|---|---|
| S4-LegS (reprod.) | 59.60 | 86.82 | 90.90 | 88.65 | 94.20 | 96.35 | 86.09 |
| Liquid-S4 (PB, ) | 62.75 | 89.02 | 91.20 | 89.50 | 94.80 | 96.66 | 87.32 |
Raw Speech Commands (35 classes, 16 kHz)
| Model | Params | Accuracy |
|---|---|---|
| S4-LegS | 307 K | 96.08% |
| S4D-Lin | 306 K | 96.25% |
| Liquid-S4 | 224 K | 96.78% |
BIDMC Vital Signs (RMSE)
| Model | HR | RR | SpO₂ |
|---|---|---|---|
| S4-LegS | 0.332 | 0.247 | 0.090 |
| S4D-Inv | 0.373 | 0.254 | 0.110 |
| Liquid-S4 (best ) | 0.303 | 0.158 | 0.066 |
These results indicate consistent 1–3% gains over S4, with Liquid-S4 frequently achieving higher accuracy and reduced parameterization (e.g., 30% fewer parameters on Speech Commands).
6. Implementation Details and Hyperparameter Choices
Standard settings are:
- Number of state space blocks (“depth”): 4–9, task-dependent
- Hidden units per block (): 128–512
- State dimension (): 7 (ListOps/IMDB) to 512 (CIFAR)
- Low-rank factor (): $1$
- Liquid kernel order (): typically 2–4, default at 3
- FFT-based convolution via PyKeops for memory efficiency
- Forward pass complexity: ; memory similar to S4 with minor overhead for correlation kernels
7. Interpretations and Relevance
Liquid-S4’s lightweight kernel modulation introduces input-correlation structure into SSMs, directly encoding similarities of input samples during both training and inference. This property facilitates data-dependent adaptive filtering and generalization under long-sequence, highly-correlated, or non-stationary conditions. The empirical improvement at negligible computational or parameter cost suggests further applicability in domains typified by long-range dependencies and dynamic signal correlation.
Liquid-S4 extends S4’s diagonal-plus-low-rank parametrization by input-dependent kernel construction, resulting in a robust and scalable sequence modeling framework with consistent empirical benefits across modalities (Hasani et al., 2022).