Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
Gemini 2.5 Pro
GPT-5
GPT-4o
DeepSeek R1 via Azure
2000 character limit reached

T-SHRED: Symbolic Regression for Regularization and Model Discovery with Transformer Shallow Recurrent Decoders (2506.15881v1)

Published 18 Jun 2025 in cs.LG

Abstract: SHallow REcurrent Decoders (SHRED) are effective for system identification and forecasting from sparse sensor measurements. Such models are light-weight and computationally efficient, allowing them to be trained on consumer laptops. SHRED-based models rely on Recurrent Neural Networks (RNNs) and a simple Multi-Layer Perceptron (MLP) for the temporal encoding and spatial decoding respectively. Despite the relatively simple structure of SHRED, they are able to predict chaotic dynamical systems on different physical, spatial, and temporal scales directly from a sparse set of sensor measurements. In this work, we improve SHRED by leveraging transformers (T-SHRED) for the temporal encoding which improves performance on next-step state prediction on large datasets. We also introduce a sparse identification of nonlinear dynamics (SINDy) attention mechanism into T-SHRED to perform symbolic regression directly on the latent space as part of the model regularization architecture. Symbolic regression improves model interpretability by learning and regularizing the dynamics of the latent space during training. We analyze the performance of T-SHRED on three different dynamical systems ranging from low-data to high-data regimes. We observe that SINDy attention T-SHRED accurately predicts future frames based on an interpretable symbolic model across all tested datasets.

Summary

An Exploration of T-SHRED: Enhancing Symbolic Regression and Model Discovery through Transformer Shallow Recurrent Decoders

The paper delineates the advancements and innovations in the architecture of Shallow Recurrent Decoders (SHRED) as utilized in system identification and forecasting with sparse sensor measurements. The enhanced model, termed Transformer-SHRED (T-SHRED), incorporates transformers in temporal encoding to improve performance measures on extensive datasets. Additionally, it integrates symbolic regression directly into the model’s latent space, via Sparse Identification of Nonlinear Dynamics (SINDy) mechanisms, to augment model regularization and interpretability.

SHRED and its Evolution to T-SHRED

SHRED models have traditionally relied on Recurrent Neural Networks (RNNs), completing temporal encoding tasks with scalability and computational efficiency notably suited for dynamics across various physical and spatial landscapes. These models leverage lightweight architectures, typically involving multi-layer perceptrons (MLPs) for spatial decoding. The critical innovation presented in this paper lies in replacing these RNN structures with a transformer-based framework. The motivation includes leveraging the robust empirical performance of transformers in learning complex temporal relationships from large-scale data. By capitalizing on transformer's scalability benefits and ability to capture long-range dependencies via attention mechanisms, T-SHRED seeks to enhance prediction accuracy and data-driven discovery.

Symbolic Regression and Interpretability via SINDy Attention

The integration of SINDy into the T-SHRED architecture introduces a layer of symbolic regression that directly interacts with the model’s latent dynamics. This entwines model interpretability and regularization by identifying and embedding governing equations within the model's temporal structure. SINDy attention mechanisms prompt the learning of structured dynamics, compelling each transformer attention head to conform to discoverable dynamical laws. This adjustment potentially translates into the effective modeling of complex physics phenomena and can facilitate scientific discovery by rendering symbolic expressions that underlie data-driven models.

Numerical Results and Comparative Analysis

The paper evaluates T-SHRED across various datasets, showcasing next-step state predictions with sparse data inputs. These experiments extend into modeling different dynamical systems spanning low-to-high data regimes, thus providing robust scenarios for architectural assessment. Concerning predictive accuracy, GRU-based SHRED models with MLP decoders delivered superior performance across datasets. However, T-SHRED models with SINDy-Attention markedly advanced interpretability, as each attention head presented a learned ordinary differential equation (ODE) model reflecting the latent space's governing dynamics.

SINDy-Attention influenced significant reductions in model complexity while moderately impacting predictive performance. For example, in datasets such as visions of atmospheric phenomena at scale, the reduction in latent space dimensions affected the test loss with interpretability gains through denser, more concise symbolic representations in model outputs.

Future Work and Implications

Advancements introduced by T-SHRED extend beyond immediate predictive tasks, gesturing towards broader applications in scientific domains necessitating compact, interpretable, and scientifically grounded models. The potential of embedding symbolic regression directly within deep learning structures presents an opportunity to balance performance with interpretability without compromising model expressiveness or flexibility.

Moreover, exploring additional architectural design changes and verifying model robustness across greater data diversities can further enhance T-SHRED’s applicability. As interpretability in machine learning gains priority among stakeholders, particularly in high-stakes scientific decisions, frameworks like T-SHRED may evolve as the vital leverage needed to bridge complex data phenomena with human-understandable rules and systems.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.