CompreSSM: In-Training Dimension Reduction for SSMs

Updated 12 October 2025

The algorithm introduces in-training model order reduction using balanced truncation from control theory to dynamically retain influential state dimensions.
It periodically applies Hankel singular value analysis to truncate low-energy states, preserving essential sequence modeling expressivity.
Experimental results show that CompreSSM accelerates training and improves accuracy on tasks like CIFAR10 and MNIST by effectively reducing redundant computations.

CompreSSM is an in-training dimension reduction algorithm applied to State Space Models (SSMs), founded on balanced truncation principles from control theory. Its core functionality is to periodically apply Hankel singular value (HSV) analysis during model optimization, thereby identifying and retaining only the most influential state dimensions. Unlike traditional post-training pruning or fixed-dimension designs, CompreSSM dynamically compresses model order in situ, resulting in computational acceleration and preservation of essential sequence modeling expressivity.

1. Theoretical Foundations and Algorithmic Overview

CompreSSM is designed for discrete linear time-invariant (LTI) systems, where the hidden state update and output equations are: $x_{k+1} = A x_k + B u_k$

$y_k = C x_k + D u_k$

Here, $x_k$ is the state, $u_k$ the system input, and $y_k$ the output. The algorithm leverages the controllability and observability Gramians. The controllability Gramian $P$ solves: $A P A^\top - P + B B^\top = 0$ and the observability Gramian $Q$ satisfies: $A^\top Q A - Q + C^\top C = 0$ Balanced truncation is achieved by transforming the system to a basis where both $P$ and $Q$ are diagonal and equal, yielding the diagonal matrix $\Sigma = \mathrm{diag}(\sigma_1, \dots, \sigma_n)$ of Hankel singular values (HSVs), ordered as $\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_n > 0$ .

In CompreSSM, the model undergoes periodic evaluation during training: the Gramians are extracted for each SSM block, HSVs are computed, and all state components with low HSV are truncated. This truncation (balanced reduction) minimizes approximation error, which is bounded by: $\| G - \hat{G} \|_\infty \leq 2\sum_{i=r+1}^{n} \sigma_i$ where $G$ is the original system and $\hat{G}$ the reduced system of order $r$ .

2. HSV Continuity and Truncation Justification

CompreSSM includes a continuity argument derived from Weyl's theorem, stating that if a matrix $G$ is perturbed by $\delta$ , each eigenvalue $\lambda_i$ shifts by at most $\max(|\lambda_1(\delta)|, |\lambda_n(\delta)|)$ . This result guarantees that HSV-based ordering is stable under typical SGD-based parameter updates, justifying incremental dimension truncation during optimization.

Truncation is performed only when the relative HSV ranking shows sustained low values for certain state dimensions over time. This ensures that only dimensions contributing negligible energy to the input-output mapping are removed, maintaining the fidelity of the primary signal pathways and temporal dependencies.

3. Application to SSM Architectures

CompreSSM is specifically formulated for linear recurrent dynamics including Linear Recurrent Units (LRUs), where the core dynamical system can be isolated and reduced using balanced truncation. Extension to selective and linear time-varying SSMs requires more sophisticated estimation, such as input-space averaging or Lyapunov inequalities that depend on current data statistics.

For linear time-varying SSMs, where $A$ , $B$ , $C$ , $D$ change with input, the method averages system dynamics or solves input-dependent modifications of the Gramian equations before applying truncation.

4. Training and Performance Implications

Experimental results indicate that in-training compression with CompreSSM yields models that retain or even improve upon the task-critical structure versus models either kept at high dimension or trained solely at reduced dimension. On sequence tasks including CIFAR10 and MNIST, aggressively compressed models outperform small models trained from scratch in both accuracy and time-to-convergence.

A plausible implication is that overparameterization followed by in-training truncation allows the model to first learn rich representations and subsequently shed redundant capacity—effectively performing dynamic model order selection driven by data.

The training speedup correlates directly with the reduction in state dimensionality, as each step involves fewer matrix multiplications and state updates, decreasing memory and computational demands.

5. Comparative Evaluation and Benefits

CompreSSM statistically outperforms classical fixed-dimension or post-hoc pruning strategies. Starting with a large model and reducing the order during training via control-theoretic criteria allows the compressed version to maintain high predictive accuracy and sequence modeling capacity that would be unattainable in a model trained at the final, reduced size ab initio.

This suggests a new paradigm for SSM design: model order is dynamically learned, not chosen a priori, with computational savings realized throughout training, not just at evaluation.

6. Practical Considerations and Limitations

The chief practical advantage of CompreSSM is its reduction in both runtime and memory usage, which makes it attractive for deployment in resource-constrained environments. Its reliance on HSVs ensures retained dimensions are maximally expressive and energetically significant.

However, the method presupposes that the importance of specific states does not oscillate wildly during successive updates—an assumption supported by empirical observations but which may be violated in nonstationary settings or under adversarial sampling. Scheduling the truncation events to occur during relatively stable phases (such as learning-rate warm-up) mitigates instability but may need refinement in highly dynamic training regimes.

7. Extensions and Future Directions

Possible future work includes developing more comprehensive reduction schemes for strongly input-dependent or time-varying SSMs, such as adaptive balanced truncation or real-time Gramian updates. There is ongoing interest in generalizing CompreSSM to related linear architectures, including Gated Linear Attention, Mamba2, or Gated DeltaNet, as well as integrating improved regularization techniques to further enhance compressive expressivity.

Enhancing the robustness of HSV tracking in non-stationary regimes and devising automated scheduling algorithms for in-training reduction represent further research directions.

In summary, CompreSSM brings control-theoretic balanced truncation into the machine learning training loop, using HSV ranking for in-training model order adaptation in SSMs. This approach preserves critical signal structure, accelerates training, and empirically surpasses fixed-dimension training strategies, offering a principled route to efficient state space sequence modeling (Chahine et al., 3 Oct 2025).

Markdown Report Issue Upgrade to Chat

References (1)

The Curious Case of In-Training Compression of State Space Models (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CompreSSM Algorithm.