Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 165 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 25 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 81 tok/s Pro
Kimi K2 189 tok/s Pro
GPT OSS 120B 445 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

PD-SSM: Expressive Sparse State-Space Model

Updated 12 October 2025
  • PD-SSM is a structured sparse state-space model that factorizes transitions into a column one-hot matrix and a complex diagonal matrix to ensure both stability and expressivity.
  • It achieves exact finite-state automata emulation with minimal state representation, maintaining bounded-input–bounded-output stability and efficient parallel computation.
  • PD-SSM integrates into hybrid neural architectures for tasks like time-series tracking and NLP, delivering reduced memory usage and computational costs compared to dense SSMs.

PD-SSM denotes a novel structured sparse state-space model framework specifically designed to enable efficient and maximally expressive state tracking, in particular the exact emulation of finite-state automata, while maintaining computational scalability for long input sequences and large state sizes (Terzić et al., 26 Sep 2025). Unlike conventional diagonal or dense transition SSMs, which trade off between efficiency and expressivity, PD-SSM introduces a transition matrix parametrization as the product of a column one-hot matrix (PP) and a complex-valued diagonal matrix (DD), conferring both bounded-input–bounded-output stability and minimal state representation for regular languages. This enables algorithmic state tracking in time series, control, and hybrid neural architectures.

1. Structured Sparse Transition Matrix

PD-SSM structures the transition matrix A(ut)A(u_t) at each time step tt as

A(ut)=P(ut)D(ut).A(u_t) = P(u_t) \cdot D(u_t).

  • D(ut)D(u_t): Complex diagonal matrix whose elements encode both magnitude (typically D(ut)<1|D(u_t)| < 1 for stability) and phase (encoded as 2πsigmoid(...)2\pi \cdot \text{sigmoid}(...)), parameterized by feed-forward neural networks. This allows each state to rotate or scale individually as a function of input utu_t.
  • P(ut)P(u_t): Input-dependent binary column one-hot matrix. For each column jj, exactly one entry is nonzero; PP is obtained by a hardmax selection over a set of parameterized matrices given utu_t (softmax for differentiable backward pass, hardmax for efficient forward pass). The PD parametrization guarantees strict sparsity.

This factorization permits parallel scan computation with theoretical O(NL)O(NL) scaling, where NN is the state dimension and LL is the sequence length.

2. Theoretical Properties and Expressivity

PD-SSM achieves several theoretically proven advantages:

  • BIBO Stability: By constraining D(ut)1ϵ|D(u_t)| \leq 1-\epsilon for some ϵ>0\epsilon>0, state norms remain bounded up to a factor of N\sqrt{N}.
  • Universal FSA Emulation: For any finite-state automaton (FSA) with NN states, a single-layer PD-SSM with state size NN and N×NN \times N linear readout exactly emulates the FSA. The embedding is minimal—no SSM with smaller state dimension (assuming unique state encodings) suffices for a generic FSA.
  • Algebraic Closure: PD matrix monoid structure ensures that products of PDP D matrices remain PDP D matrices. The recurrence remains strictly sparse, enabling efficient computation and chaining.

3. Model Architecture and Computational Efficiency

Within practical architectures, neural networks parameterize both DD and PP, typically via:

  • D(ut)=sigmoid(WoM(Gelu(WiMut+biM))+boM)|D(u_t)| = \text{sigmoid}(W_o^M(\text{Gelu}(W_i^M u_t + b_i^M)) + b_o^M),
  • ϕ(D(ut))=2πsigmoid(WoP(Gelu(WiPut+biP))+boP)\phi(D(u_t)) = 2\pi \cdot \text{sigmoid}(W_o^P(\text{Gelu}(W_i^P u_t + b_i^P)) + b_o^P),
  • P(ut)P(u_t): Columnwise hardmax over a softmax-weighted mixture of trainable matrices.

Operationally, the forward pass uses strict sparsity; the backward pass leverages gradients of the softmax approximation. This yields significant memory and compute reduction over dense SSMs—parallel scan cost is linear, not cubic in NN.

4. Empirical Evaluation

Extensive experiments substantiate PD-SSM’s superiority:

  • FSA State Tracking: Perfect accuracy (up to 100%) is achieved in emulating diverse automata, including non-solvable cases (e.g., alternating group A5A_5), well beyond the capacity of diagonal SSMs and variants such as Mamba with real non-negative matrices. Generalization to sequence lengths unseen during training is demonstrated.
  • Time-Series Classification: Competitive performance with neural controlled differential equation paradigms on multiclass time-series from the UEA archive.
  • Long-Range Arena and NLP: Integration into LLMs (Transformer–SSM hybrids) allows explicit state-tracking for FSAs whose transitions are encoded by variable-length English sentences.

5. Integration with Hybrid Architectures

PD-SSM is modular and can be incorporated into hybrid architectures:

  • Transformer–SSM Integration: PD-SSM layers inserted into frozen LLM backbones (e.g., Qwen 2.5) enable explicit automaton state tracking in complex natural language tasks.
  • Editor’s term: Hybrid SSM—architectures combining PD-SSM for algorithmic state logic and deep models for representational capacity. This integration allows symbolic and sub-symbolic reasoning in neural frameworks, addressing multi-hop inference and control flow challenges.

6. Practical Implications

PD-SSM’s sparse and expressive structure is advantageous for:

  • Algorithmic State Tracking: Control, reasoning, and symbolic processing tasks with strict state requirements.
  • Long-Range Time-Series Analysis: Sensor, financial, and biological signals requiring efficient tracking over long horizons and large state spaces.
  • Hybrid NLP Models: Tasks requiring reasoning about latent or explicit finite-state components in textual input. The minimal state representation and computational scalability position PD-SSM as a foundational building block in both algorithmic and real-world sequential processing.

7. Limitations and Future Directions

Current PD-SSM incurs some overhead in one-hot selection and sparse matrix generation. Areas identified for future improvement:

  • Efficient one-hot selection: Optimizing the softmax/hardmax mechanism for both computational speed and gradient propagation.
  • Custom backward pass: Implementation of highly efficient backward algorithms tailored for PD sparsity.
  • Scale-up and pretraining: Exploration in large-scale pretraining regimens and application to broader domains.
  • Hybrid model enhancements: Flexible integration schemes with diverse backbone architectures.

The ability of PD-SSM to exactly model any FSA with minimum necessary state size, combined with efficient scaling, distinguishes it from prior structured SSMs and positions it as the state-of-the-art for expressive, tractable state-tracking in sequential data (Terzić et al., 26 Sep 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to PD-SSM.