PD-SSM: Expressive Sparse State-Space Model
- PD-SSM is a structured sparse state-space model that factorizes transitions into a column one-hot matrix and a complex diagonal matrix to ensure both stability and expressivity.
- It achieves exact finite-state automata emulation with minimal state representation, maintaining bounded-input–bounded-output stability and efficient parallel computation.
- PD-SSM integrates into hybrid neural architectures for tasks like time-series tracking and NLP, delivering reduced memory usage and computational costs compared to dense SSMs.
PD-SSM denotes a novel structured sparse state-space model framework specifically designed to enable efficient and maximally expressive state tracking, in particular the exact emulation of finite-state automata, while maintaining computational scalability for long input sequences and large state sizes (Terzić et al., 26 Sep 2025). Unlike conventional diagonal or dense transition SSMs, which trade off between efficiency and expressivity, PD-SSM introduces a transition matrix parametrization as the product of a column one-hot matrix () and a complex-valued diagonal matrix (), conferring both bounded-input–bounded-output stability and minimal state representation for regular languages. This enables algorithmic state tracking in time series, control, and hybrid neural architectures.
1. Structured Sparse Transition Matrix
PD-SSM structures the transition matrix at each time step as
- : Complex diagonal matrix whose elements encode both magnitude (typically for stability) and phase (encoded as ), parameterized by feed-forward neural networks. This allows each state to rotate or scale individually as a function of input .
- : Input-dependent binary column one-hot matrix. For each column , exactly one entry is nonzero; is obtained by a hardmax selection over a set of parameterized matrices given (softmax for differentiable backward pass, hardmax for efficient forward pass). The PD parametrization guarantees strict sparsity.
This factorization permits parallel scan computation with theoretical scaling, where is the state dimension and is the sequence length.
2. Theoretical Properties and Expressivity
PD-SSM achieves several theoretically proven advantages:
- BIBO Stability: By constraining for some , state norms remain bounded up to a factor of .
- Universal FSA Emulation: For any finite-state automaton (FSA) with states, a single-layer PD-SSM with state size and linear readout exactly emulates the FSA. The embedding is minimal—no SSM with smaller state dimension (assuming unique state encodings) suffices for a generic FSA.
- Algebraic Closure: PD matrix monoid structure ensures that products of matrices remain matrices. The recurrence remains strictly sparse, enabling efficient computation and chaining.
3. Model Architecture and Computational Efficiency
Within practical architectures, neural networks parameterize both and , typically via:
- ,
- ,
- : Columnwise hardmax over a softmax-weighted mixture of trainable matrices.
Operationally, the forward pass uses strict sparsity; the backward pass leverages gradients of the softmax approximation. This yields significant memory and compute reduction over dense SSMs—parallel scan cost is linear, not cubic in .
4. Empirical Evaluation
Extensive experiments substantiate PD-SSM’s superiority:
- FSA State Tracking: Perfect accuracy (up to 100%) is achieved in emulating diverse automata, including non-solvable cases (e.g., alternating group ), well beyond the capacity of diagonal SSMs and variants such as Mamba with real non-negative matrices. Generalization to sequence lengths unseen during training is demonstrated.
- Time-Series Classification: Competitive performance with neural controlled differential equation paradigms on multiclass time-series from the UEA archive.
- Long-Range Arena and NLP: Integration into LLMs (Transformer–SSM hybrids) allows explicit state-tracking for FSAs whose transitions are encoded by variable-length English sentences.
5. Integration with Hybrid Architectures
PD-SSM is modular and can be incorporated into hybrid architectures:
- Transformer–SSM Integration: PD-SSM layers inserted into frozen LLM backbones (e.g., Qwen 2.5) enable explicit automaton state tracking in complex natural language tasks.
- Editor’s term: Hybrid SSM—architectures combining PD-SSM for algorithmic state logic and deep models for representational capacity. This integration allows symbolic and sub-symbolic reasoning in neural frameworks, addressing multi-hop inference and control flow challenges.
6. Practical Implications
PD-SSM’s sparse and expressive structure is advantageous for:
- Algorithmic State Tracking: Control, reasoning, and symbolic processing tasks with strict state requirements.
- Long-Range Time-Series Analysis: Sensor, financial, and biological signals requiring efficient tracking over long horizons and large state spaces.
- Hybrid NLP Models: Tasks requiring reasoning about latent or explicit finite-state components in textual input. The minimal state representation and computational scalability position PD-SSM as a foundational building block in both algorithmic and real-world sequential processing.
7. Limitations and Future Directions
Current PD-SSM incurs some overhead in one-hot selection and sparse matrix generation. Areas identified for future improvement:
- Efficient one-hot selection: Optimizing the softmax/hardmax mechanism for both computational speed and gradient propagation.
- Custom backward pass: Implementation of highly efficient backward algorithms tailored for PD sparsity.
- Scale-up and pretraining: Exploration in large-scale pretraining regimens and application to broader domains.
- Hybrid model enhancements: Flexible integration schemes with diverse backbone architectures.
The ability of PD-SSM to exactly model any FSA with minimum necessary state size, combined with efficient scaling, distinguishes it from prior structured SSMs and positions it as the state-of-the-art for expressive, tractable state-tracking in sequential data (Terzić et al., 26 Sep 2025).