STEAD: Stanford Earthquake Dataset

Updated 1 October 2025

STEAD is a globally curated dataset of over one million three-component seismic waveforms, fully annotated with precise event metadata and manual phase picks.
It supports state-of-the-art machine learning models for earthquake detection, phase picking, localization, and magnitude estimation through rigorous evaluation protocols.
Its comprehensive structure enables both supervised and transfer learning, enhancing real-time earthquake early warning systems and multi-station seismic analyses.

The STanford EArthquake Dataset (STEAD) is a globally curated, publicly available benchmark dataset of seismic waveforms, specifically designed to advance machine learning research in earthquake signal processing. STEAD has become a principal resource for evaluating and developing data-driven models in seismology, including phase picking, earthquake detection, localization, and magnitude estimation. The dataset’s unique attributes are its scale, metadata completeness, and its suitability for both supervised and transfer learning tasks in the context of single- and multi-station seismic monitoring.

1. Dataset Composition and Structure

STEAD comprises over one million three-component (vertical, north-south, east-west) seismic waveforms, each associated with a comprehensive set of event and recording metadata. The core properties of STEAD include:

Global Distribution: Seismic records originate from a diverse, worldwide array of seismic stations, capturing events across a wide range of tectonic settings.
Labeling: Each waveform is accompanied by precise, manually annotated P- and S-wave arrival picks, event origin times, hypocentral coordinates, magnitude, along with detailed station metadata (station codes, coordinates, instrument response).
Waveform Specifications: Typical waveform entries are provided as fixed-length time series arrays, sampled at standardized rates compatible with common seismological workflows. The duration of the waveforms and the exact preprocessing pipelines (such as detrending, filtering, and normalization) are specified in the STEAD documentation and referenced literature.
File Organization: The dataset is typically distributed in hierarchical data formats that support efficient access and subsetting, such as HDF5, with each waveform entry keyed by event and station metadata. Metadata is embedded as structured attributes, facilitating rapid integration into machine learning pipelines.

2. Benchmarking and Standard Evaluation Protocols

STEAD has become the de facto benchmark for comparative evaluation of machine learning models in the seismological community. State-of-the-art earthquake detection and phase picking models, such as EQTransformer, PhaseNet, and EQNet, conduct rigorous empirical testing using carefully defined splits and evaluation protocols based on STEAD (Zhu et al., 2021, Zhang et al., 13 Aug 2024). Standard metrics derived from STEAD include:

Phase Picking Accuracy: Precision, recall, F1-score, mean and standard deviation of pick-time residuals (difference between predicted and manual labels), and mean absolute error (MAE) for both P and S arrivals.
Detection Generalization: Models are assessed for robustness across diverse regions—not limited to those represented in the training set—by evaluating on STEAD’s global test partitions.

Evaluation commonly employs a held-out subset of approximately 120,000 waveforms from STEAD with high-quality manual picks, ensuring comparability across studies.

3. Utilization in Deep Learning Architectures

STEAD is central to the development and validation of deep learning architectures for seismic signal processing. A representative use-case is the end-to-end earthquake detection architecture described in (Zhu et al., 2021):

Feature Extraction: Raw three-component waveforms are input to a deep backbone network; for example, a modified ResNet-18 using 1D convolutions reduces the temporal dimension and extracts condensed representations.
Phase Picking: Separate sub-networks process feature encodings with outputs interpreted as activation sequences, from which P and S arrival times are picked using peak detection above a threshold (e.g., 0.5).
Multi-task Training: The total loss function sums binary cross-entropy losses for the P and S picks and for event detection, with weights controlling the loss contribution of each task.
Shift-and-Stack: For multi-station settings, features from individual stations are "shifted" in feature space according to theoretical seismic travel times (based on assumed velocity models), aligning features for potential event localization.

STEAD’s data quality and breadth allow models trained on regional data (e.g., Northern California) to be evaluated on global datasets, thereby quantifying generalization.

4. Unified and Real-Time Earthquake Early Warning Networks

Recent innovations leverage STEAD to develop unified, real-time neural architectures capable of simultaneous phase picking, location, and magnitude estimation. For example, the Fast Information Streaming Handler (FisH) model (Zhang et al., 13 Aug 2024) employs:

Streaming Input Handling: A RetNet-based encoder enables efficient recurrent inference (O(1) per timestep), allowing the model to process streaming seismic data in real time.
Multi-task Decoding: Dedicated heads simultaneously output phase picks, location estimates, and magnitudes by leveraging shared latent representations.
Rapid Convergence: On STEAD, FisH achieves P-pick F1-scores of up to 0.99, location errors as low as 6.0 km (2.6 km for distance), and magnitude errors of 0.14 in steady-state; within just 3 seconds after P-wave arrival, errors remain low—location error ~8.06 km, magnitude error ~0.18.
Implications: These characteristics make STEAD-trained models suitable for deployment in embedded EEW systems with stringent real-time constraints, as high accuracy and low latency are critical within the operational time window after earthquake initiation.

5. Multi-Station and Spatio-Temporal Deep Learning Applications

STEAD’s multi-station structure supports the application of advanced spatio-temporal machine learning models for earthquake detection and characterization. Graph-based neural methodologies, such as the Spatio-Temporal Graph Convolutional Network (GCN) with Spectral Structure Learning Convolution (Spectral SLC) (Piriyasatit et al., 14 Mar 2025), illustrate this paradigm:

Input Graph Construction: Nodes correspond to seismic stations (using STEAD’s multi-station waveform data). Node features are three-component signals, appropriately preprocessed (detrending, bandpass filtering, normalization, downsampling).
Spectral Graph Convolution: Both static (learned, global) and dynamic (input-driven) inter-station connectivity matrices are leveraged to propagate information across stations within the learned graph structure, using Chebyshev polynomial expansions.
Temporal Dynamics: Stationwise representations are passed through GRU units to capture temporal dependencies.
Station-Specific Probabilities: The network outputs time series of earthquake detection probabilities for each station, enabling direct exploitation of variable wave arrival times across the network.
Comparison: Performance on datasets can be quantified by True Positive Rate and False Positive Rate at multiple detection thresholds, with ROC curves demonstrating higher sensitivity and lower error relative to baselines.

While the referenced GCN experiments primarily use Japanese regional data, the methodology is directly applicable to STEAD due to its global, multi-station coverage and the availability of synchronized multi-component waveforms.

6. Comparative Position among Global Benchmark Datasets

STEAD is distinguished in the seismological ML landscape by its scale, event label fidelity, and global coverage. In comparison:

Dataset	Scope/Distribution	Labeling Details
STEAD	Global, >1M waveforms, many stations	Manual P/S picks, metadata
LEN‑DB	1.2M waveforms, 1487 receivers worldwide	Earthquake/noise by radius
Ridgecrest Seq	Regional (California)	Event catalog, picks

STEAD provides detailed arrival picks and covers a broad global domain, while datasets such as LEN‑DB focus more on earthquake vs. noise binary classification and employ different pre-processing, labeling, and receiver selection strategies.
Both STEAD and LEN‑DB enable benchmarking for ML-based earthquake detection, but STEAD uniquely emphasizes precise arrival pick annotation and multi-regional, multi-instrument coverage.

7. Limitations and Open Challenges

Despite its role as a benchmark, STEAD presents several limitations:

Noise and Anomalies: High seismic noise environments present in some stations can cause increased false negatives or degrade phase picking accuracy. While data curation is extensive, threshold-based manual annotations mean some edge cases or ambiguous signals persist.
Magnitude Completeness: As in global seismic datasets, incompleteness at low magnitudes due to detection and cataloging biases limits training and evaluation of models’ sensitivity to small events.
Single-Station Focus/Expansion: While supporting multi-station methodologies, original STEAD structure is per-station. Fully exploiting graph-based or network-wide temporal-spatial learning may require additional synchronization or dataset expansion.
Generalization Limits: Models trained or evaluated on STEAD may not universally generalize to domains with dramatically different instrument response, deployment geometry, or noise characteristics. A plausible implication is the need for domain adaptation or fine-tuning for operational EEW.

Conclusion

STEAD provides a comprehensive, high-fidelity benchmark for machine learning research in seismology and earthquake early warning. It has driven advances in deep learning methods for phase picking, event detection, localization, and magnitude estimation, particularly in real-time and streaming contexts. Ongoing research continues to leverage STEAD, alongside advancing architectures and multi-station learning paradigms, to push the frontier in automated seismic monitoring.