Liquid Structural State-Space Models
Abstract: A proper parametrization of state transition matrices of linear state-space models (SSMs) followed by standard nonlinearities enables them to efficiently learn representations from sequential data, establishing the state-of-the-art on a large series of long-range sequence modeling benchmarks. In this paper, we show that we can improve further when the structural SSM such as S4 is given by a linear liquid time-constant (LTC) state-space model. LTC neural networks are causal continuous-time neural networks with an input-dependent state transition module, which makes them learn to adapt to incoming inputs at inference. We show that by using a diagonal plus low-rank decomposition of the state transition matrix introduced in S4, and a few simplifications, the LTC-based structural state-space model, dubbed Liquid-S4, achieves the new state-of-the-art generalization across sequence modeling tasks with long-term dependencies such as image, text, audio, and medical time-series, with an average performance of 87.32% on the Long-Range Arena benchmark. On the full raw Speech Command recognition, dataset Liquid-S4 achieves 96.78% accuracy with a 30% reduction in parameter counts compared to S4. The additional gain in performance is the direct result of the Liquid-S4's kernel structure that takes into account the similarities of the input sequence samples during training and inference.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
Overview
This paper introduces a new kind of machine learning model called Liquid-S4. It’s designed to understand and learn from very long sequences of data, like long text, audio recordings, medical signals, or pixel sequences from images. Liquid-S4 builds on a powerful family of models called state-space models (SSMs) and adds a “liquid” feature that lets the model adapt its behavior based on the incoming inputs, even while it’s making predictions. The goal is to get better accuracy on tasks with long-term dependencies while staying efficient and stable.
Key Questions
The authors set out to answer a simple idea with a big impact:
- Can we make already-strong state-space models (like S4) even better at long sequences by letting their “memory update rules” depend on the current input?
- Can we do this in a way that is fast, scalable, and stable—so it works well on real-world tasks like speech, text, images, and health data?
How It Works (Methods, explained simply)
Think of a state-space model (SSM) like a machine with:
- A hidden memory (the “state”),
- Rules that say how this memory changes over time (the “state transition”),
- A way to turn the memory into an output.
Traditional SSMs use fixed rules to update their memory: the rules don’t change based on the current input. S4 is a high-performing version that uses smart math tricks to remember what happened long ago without getting confused.
Liquid-S4 adds a “liquid” twist:
- The rule for updating memory changes depending on the input at that moment. In everyday terms, the model reacts differently when it sees different kinds of data—like a DJ adjusting the sound mix depending on the song.
Here’s the main idea using simple analogies:
- State-space memory: Imagine a bookshelf of “memory slots.” Each slot keeps track of a different kind of pattern over time.
- S4’s trick: It organizes and updates these slots using a proven method (called HiPPO/Legendre) so the model “remembers” the right parts of the past.
- Liquid-S4’s add-on: Besides the usual “slide a filter over the sequence” (a convolution), it adds extra “filters” that look at similarities between points in the sequence, like noticing pairs or triples of moments that are related. This is like the model not only reading the sequence left-to-right, but also paying attention to repeated or matching patterns across time.
Under the hood:
- They turn continuous changes over time into steps (discretization), so it works like a fast sequence model.
- They compute the main S4 filter efficiently using math tools (a special “Cauchy kernel” and inverse FFT).
- They compute the new “liquid” filter by reusing the S4 filter and combining it with input-dependent weights. Practically, this captures correlations—pairs, triples, etc.—of input values to detect repeating patterns.
Two versions make this efficient:
- A “KB” version: combines the standard S4 filter with extra input-powered weights.
- A simpler “PB” version: uses a more direct formula that often works just as well and is easier to compute.
Main Findings and Why They Matter
Across many tests, Liquid-S4 outperforms strong baselines, including S4 and its variants, Transformers, RNNs, and CNNs, especially on long sequences. Here are the highlights:
- Long Range Arena (LRA, six tasks with very long sequences): Liquid-S4 achieves an average accuracy of about 87.32%, setting a new state-of-the-art across the suite.
- Speech Command recognition (full 35 labels): Liquid-S4 reaches 96.78% accuracy and uses about 30% fewer parameters than S4, meaning it’s both better and more compact.
- Medical vital signs (BIDMC): Liquid-S4 achieves the best results on predicting heart rate, breathing rate, and blood oxygen, showing it’s useful in health contexts.
- Flattened image sequences (sCIFAR): Liquid-S4 slightly improves the best reported accuracy, showing it handles pixel-by-pixel sequence classification well.
Why this matters:
- Long sequences are hard: information can get “lost” over time, and models can become unstable. Liquid-S4’s design lets it remember the right things and adapt to what it’s seeing, even when sequences are very long.
- It’s efficient and scalable: The model keeps training and inference fast using clever math tricks and careful parameterization. Better performance with fewer parameters can mean easier deployment in real-world systems.
Implications and Impact
Liquid-S4 shows that letting a model’s internal update rules depend on the current input can boost performance on a wide range of long-sequence tasks. This has several promising impacts:
- Better speech and audio recognition that stays robust over long clips.
- More accurate analysis of medical time-series for health monitoring.
- Stronger performance on long texts and structured sequences (like code or log data).
- Potentially more reliable systems in robotics or control, where adapting to input on the fly is essential.
In short, Liquid-S4 combines the strength of structured memory (S4) with input-aware adaptability (liquid) to set new performance standards on tough long-sequence benchmarks—while staying efficient and practical.
Collections
Sign up for free to add this paper to one or more collections.