Oscillatory State-Space Models (2410.03943v3)

Published 4 Oct 2024 in cs.LG and cs.NE

Abstract: We propose Linear Oscillatory State-Space models (LinOSS) for efficiently learning on long sequences. Inspired by cortical dynamics of biological neural networks, we base our proposed LinOSS model on a system of forced harmonic oscillators. A stable discretization, integrated over time using fast associative parallel scans, yields the proposed state-space model. We prove that LinOSS produces stable dynamics only requiring nonnegative diagonal state matrix. This is in stark contrast to many previous state-space models relying heavily on restrictive parameterizations. Moreover, we rigorously show that LinOSS is universal, i.e., it can approximate any continuous and causal operator mapping between time-varying functions, to desired accuracy. In addition, we show that an implicit-explicit discretization of LinOSS perfectly conserves the symmetry of time reversibility of the underlying dynamics. Together, these properties enable efficient modeling of long-range interactions, while ensuring stable and accurate long-horizon forecasting. Finally, our empirical results, spanning a wide range of time-series tasks from mid-range to very long-range classification and regression, as well as long-horizon forecasting, demonstrate that our proposed LinOSS model consistently outperforms state-of-the-art sequence models. Notably, LinOSS outperforms Mamba and LRU by nearly 2x on a sequence modeling task with sequences of length 50k.

Citations (1)

View on Semantic Scholar

Summary

The paper introduces LinOSS, a novel approach that leverages forced second-order ODEs to enable stable oscillatory sequence modeling.
It employs implicit and implicit-explicit discretizations ensuring global stability and time reversibility for efficient computation.
Empirical evaluations show LinOSS outperforms state-of-the-art models in tasks like long-range classification, forecasting, and regression.

LinOSS: A Novel Approach to Sequence Modeling with Oscillatory State-Space Models

The paper introduces Linear Oscillatory State-Space models (LinOSS), a novel approach to sequence modeling that leverages stable discretizations of forced linear second-order ordinary differential equations (ODEs) to model oscillators. The model distinguishes itself through several key properties, including provable stability with nonnegative diagonal state matrices, symplectic discretization for time reversibility, and universality as an approximator of continuous and causal operators between time-series. Empirical evaluations demonstrate that LinOSS consistently matches or outperforms state-of-the-art sequence models on a wide range of tasks.

Model Formulation

The LinOSS model is based on the following system of forced linear second-order ODEs:

$\begin{aligned} \mathbf{x}^{\prime\prime}(t) &= -\Omega \mathbf{x}(t) + B \mathbf{u}(t) + \mathbf{b}, \ \mathbf{y}(t) &= C \mathbf{x}(t) + D \mathbf{u}(t), \end{aligned}$

where $\mathbf{x}(t) \in \mathbb{R}^m$ is the hidden state, $\mathbf{y}(t) \in \mathbb{R}^q$ is the output state, $\mathbf{u}(t) \in \mathbb{R}^p$ is the time-dependent input signal, and $\Omega \in \mathbb{R}^{m \times m}$ , $B \in \mathbb{R}^{m \times p}$ , $C \in \mathbb{R}^{q \times m}$ , $D \in \mathbb{R}^{q \times p}$ , and $\mathbf{b} \in \mathbb{R}^m$ are weights and biases. A crucial aspect of this formulation is that $\Omega$ is a diagonal matrix with nonnegative entries, which ensures stable dynamics.

To facilitate efficient computation, the second-order ODE is converted into a first-order system by introducing an auxiliary state $\mathbf{v}(t) \in \mathbb{R}^m$ , with $\mathbf{v} = \mathbf{x}^\prime$ :

$\begin{aligned} \mathbf{x}^{\prime}(t) &= \mathbf{v}(t), \ \mathbf{v}^{\prime}(t) &= -\Omega \mathbf{x}(t) + B \mathbf{u}(t). \end{aligned}$

Discretization Schemes and Stability

The paper introduces two discretization schemes: implicit time integration (LinOSS-IM) and implicit-explicit time integration (LinOSS-IMEX).

Implicit Time Integration (LinOSS-IM)

The implicit discretization is given by:

$\begin{aligned} \mathbf{x}_{n} &= \mathbf{x}_{n-1} + \Delta t (\mathbf{v}_n), \ \mathbf{v}_{n} &= \mathbf{v}_{n-1} + \Delta t (-\Omega \mathbf{x}_n + B \mathbf{u}_n), \end{aligned}$

where $\Delta t$ is the timestep. This discretization is shown to yield globally asymptotically stable dynamics, provided that $\Omega$ is a nonnegative diagonal matrix.

Implicit-Explicit Time Integration (LinOSS-IMEX)

The implicit-explicit discretization is given by:

$\begin{aligned} \mathbf{x}_{n} &= \mathbf{x}_{n-1} + \Delta t (\mathbf{v}_n), \ \mathbf{v}_{n} &= \mathbf{v}_{n-1} + \Delta t (-\Omega \mathbf{x}_{n-1} + B \mathbf{u}_n). \end{aligned}$

This scheme is symplectic, meaning it preserves a Hamiltonian close to that of the continuous system. Consequently, it conserves the symmetry of time reversibility, leading to memory-efficient implementations via backpropagation through time.

Fast Recurrence via Associative Parallel Scans

To accelerate training and inference, LinOSS employs associative parallel scans. This technique reduces the computational time of recurrent operations from $O(N)$ to $O(\log_2(N))$ , where $N$ is the sequence length.

The associative operation is defined as:

$(A_1, B_1) \bullet (A_2, B_2) = (A_1 A_2, A_1 B_2 + B_1),$

where $A$ represents the hidden-to-hidden weight matrix and $B$ represents the input transformation. This operation is applied to both the implicit and implicit-explicit discretizations, enabling efficient computation of the recurrent dynamics.

Figure 1: Schematic drawing of the proposed Linear Oscillatory State-Space model (LinOSS). The input sequences are processed through multiple LinOSS blocks. Each block is composed of a LinOSS layer followed by a nonlinear transformation, specifically a Gated Linear Units \citep{glu} (GLU) layer in our case. After passing through several LinOSS blocks, the latent sequences are decoded to produce the final output sequence.

Theoretical Properties

The paper provides rigorous theoretical analysis of LinOSS, demonstrating its stability, capacity for learning long-range interactions, and universality as an approximator.

Stability

It is proven that LinOSS-IM exhibits asymptotically stable dynamics for any nonnegative diagonal matrix $\Omega$ . This contrasts with previous state-space models that require heavily constrained parameterizations. LinOSS-IMEX, on the other hand, features eigenvalues with a magnitude of 1 which allows for learning interactions over arbitrarily long sequences.

Universality

LinOSS is proven to be a universal approximator of continuous and causal operators between time-series. This result indicates that LinOSS can express complex mappings between general input and output sequences, not necessarily limited to oscillatory patterns. The proof relies on encoding the infinite-dimensional operator with a finite-dimensional operator that utilizes the structure of the LinOSS ODE system.

Empirical Evaluation

The LinOSS models were evaluated on a range of sequence modeling tasks, including long-range classification, regression, and long-horizon forecasting. The empirical results demonstrate that LinOSS consistently matches or outperforms state-of-the-art models, including Mamba, LRU, and S5.

Long-Range Interactions

On the UEA Multivariate Time Series Classification Archive, LinOSS-IM achieved state-of-the-art results, particularly on datasets with long sequences such as EigenWorms, improving the test accuracy from 85% to 95%.

Extreme Length Sequences

On the PPG-DaLiA dataset, LinOSS models significantly outperformed other models, with LinOSS-IM exhibiting nearly a 2x improvement over Mamba and a 2.5x improvement over LRU.

Long-Horizon Forecasting

On a weather prediction task, LinOSS models outperformed Transformer-based baselines and other state-space models in forecasting future climate variables.

Conclusion

The LinOSS model represents a significant advancement in sequence modeling, offering a combination of theoretical rigor and empirical performance. Its stable dynamics, symplectic discretization, and universality make it a promising architecture for a wide range of applications. The empirical results demonstrate that LinOSS can effectively model long-range interactions and achieve state-of-the-art performance on challenging real-world datasets.

Future research directions may include exploring different parameterizations of the diagonal weight matrix $\Omega$ , incorporating adaptive time-stepping schemes, and applying LinOSS to other modalities such as audio and video.

PDF Markdown

Follow-up Questions

Related Papers

Authors (2)

Tweets

https://twitter.com/vtaohu/status/1910763823025320087