SeismoGPT: Real-Time Seismic Forecasting
- SeismoGPT is a transformer-based model designed for real-time, short-term forecasting of three-component seismic waveforms.
- It employs causal self-attention and branched embedding to capture temporal dynamics and spatial correlations across sensor arrays.
- The model's predictions support Newtonian noise mitigation and improve control strategies in next-generation gravitational wave observatories.
SeismoGPT refers to a class of deep learning architectures, specifically with a transformer backbone, developed to forecast multivariate seismic waveform data in real time. Its principal aims are robust, accurate short-term prediction of three-component seismic data and exploitation of both temporal and spatial dependencies, with the most prominent application being real-time Newtonian noise mitigation and observatory control for next-generation gravitational wave facilities such as the Einstein Telescope (Esmail et al., 25 Sep 2025). The following sections detail the concepts, architecture, training scheme, evaluation, and operational implications of SeismoGPT.
1. Model Architecture
SeismoGPT is fundamentally a transformer encoder-based neural network customized for autoregressive time series forecasting of seismic waveforms. Key architectural features include:
- Input Handling: Inputs consist of either single-station three-channel waveforms (vertical, north, east) or multistation array data, discretized as non-overlapping temporal tokens.
- Embedding Layers: Each token (e.g., a matrix of length L × 3 for L samples per token per component) is embedded into a high-dimensional latent vector via 1D convolutional blocks.
- Positional Encoding: Sinusoidal functions encode temporal position, preserving order in the input sequence.
- Transformer Encoder Stack: Multiple layers of (masked) multi-head self-attention and feed-forward blocks process the embedded sequence, enabling the capture of long-range time dependencies and, for array-based models, spatial correlations as well.
- Causal Masking (Autoregressive Property): A lower-triangular mask (M) is applied so that at each time step, predictions use only current and previous tokens: for input X ∈ ℝT × d,
where Q, K, V are standard linear projections and M enforces strict causality (M_{ij} = 0 if j ≤ i, −∞ if j > i).
- Prediction Head: A linear projection outputs the forecasted waveform token(s) in waveform space.
- Array-Specific Structure: For array data, features are embedded with a 2D convolution, then split: one branch processes temporally (tokens per station, causal attention), the other spatially (tokens across stations at the same timestep, full attention). Outputs are recombined prior to prediction.
This architecture enables SeismoGPT to flexibly model local and network-wide correlations in seismic waveforms, and its autoregressive design is central to iterative forecasting.
2. Training Methodology
SeismoGPT is trained using an autoregressive approach on synthetic datasets generated from physical models of seismic propagation:
- Data Generation: Noise-free, three-component seismic traces are computed from physical Earth models (e.g., ak135f_2s) using the Instaseis package. Both single-station and 16-station network configurations were used.
- Tokenization: Seismograms are segmented into tokens of 16 samples. Context windows (e.g., 64 tokens = 1,024 samples) are used as model input.
- Learning Task: Given a context of N tokens, the model predicts the N+1-th token. During training, each newly predicted token is appended to the context window for subsequent steps, mirroring the deployment scenario.
- Loss Function: Mean Squared Error (MSE) between predicted and ground-truth waveform segments.
- Optimization: Adam optimizer with initial learning rate , StepLR decay (γ = 0.8, every 5 epochs), and early stopping based on validation loss.
- Regularization: Random padding masks applied to context tokens during training enforce robustness to missing/incomplete observations.
For the array model, input is of shape (batch, stations, tokens, token length, channels). Branched attention (temporal and spatial) allows the model to utilize both time history and inter-station redundancy.
3. Performance and Error Propagation
Evaluation of SeismoGPT is performed using a fixed-length context window (e.g., 40 tokens = 337.9 s) and a forecast window (e.g., 24 tokens = 202.1 s). Performance characteristics include:
- Immediate Predictions: Forecasts within the initial portion of the forecast window (where predicted tokens are most closely tied to known context) are highly accurate.
- Error Accumulation: Due to the autoregressive setup, prediction errors from each token propagate forward: as more tokens are generated, the model's input becomes increasingly synthetic, leading to gradual performance degradation.
- Improved Future Stability with Arrays: The array-based model, utilizing spatial attention, demonstrates better accuracy and stability for longer horizon forecasts compared to the single-station variant. The redundancy across networked stations is explicitly leveraged to reduce forecast variance.
4. Temporal and Spatial Dependency Modeling
SeismoGPT directly learns:
- Temporal Dynamics: Through causal self-attention, the model captures long-term dependencies in the waveform sequence, supporting accurate prediction of phase arrivals and sustained oscillatory structure.
- Spatial Correlations: In the array-based design, spatial self-attention across simultaneously observed tokens at multiple stations enables the network to account for spatially coherent wavefield features—essential for localizing events and producing stable, spatially consistent forecasts.
This enables prediction of both local fluctuations (site-specific) and network-level (array) propagation effects.
5. Implications for Newtonian Noise Mitigation and Observatory Operations
The primary motivation for SeismoGPT's development is the mitigation of Newtonian noise (NN) in gravitational wave detectors:
- Advance Forecasting: Accurate near-term forecasting of seismic fields allows observatories to anticipate and subtract NN contributions to gravitational measurements before significant impact.
- Nonlinear and Spatiotemporal Filtering: The transformer’s ability to model nonlinearities and spatial structure improves upon classical approaches such as Wiener filtering, which are limited to linear, stationary signal assumptions.
- Real-Time Control: Predictions from SeismoGPT can be integrated into active vibration isolation systems, enabling proactive adjustment to incoming ground motion or detuning of interferometers in anticipation of high-seismic intervals.
- Scheduling and Sensitivity Optimization: By forecasting site-wide seismic states, observatories can optimize data acquisition periods, minimize false positives from environmental events, and allocate resources for instrument protection when forecasts indicate elevated ground motion risk.
6. Summary of Core Equations and Technical Details
Principal mathematical operations:
- Causal Self-Attention:
with causal mask for , otherwise.
- Waveform Tokenization and Embedding:
- tokenized as , flattened, embedded by 1D CNN: .
- In arrays: , 2D CNN embedding, branched into temporal () and spatial () streams.
- Output projection: Final transformer output mapped back to waveform space.
7. Outlook and Significance
SeismoGPT constitutes a marked advance in data-driven, short-term seismic forecasting by leveraging transformer models' strengths for temporal and spatiotemporal sequence modeling. Its ability to operate autoregressively and integrate array data makes it especially suitable for environments requiring high-fidelity forecasts and fast, adaptive response. The methodology generalizes readily to other dense sensor array applications requiring simultaneous temporal and spatial inference. A plausible implication is that future integration of such models into observatory control systems could substantially improve gravitational wave sensitivity by mitigating Newtonian noise with predictive, data-driven filtering (Esmail et al., 25 Sep 2025).