Reliable Real-time Seismic Signal/Noise Discrimination

Updated 13 November 2025

Reliable real-time seismic discrimination is the robust identification of true earthquake signals from non-seismic noise using short time windows.
Advanced deep learning models and feature-based approaches significantly outperform traditional methods, achieving accuracies over 92% and dramatically reducing false positives.
Optimized data preprocessing, rapid inference pipelines, and adaptive thresholding enable practical applications in earthquake early warning and large-scale monitoring.

Reliable real-time seismic signal/noise discrimination refers to the robust, low-latency identification of earthquake-related ground motion versus noise or non-seismic transients using real-world seismic data, often under operational constraints. This discrimination problem underpins earthquake early warning, rapid event cataloging, microseismic monitoring, and noise-robust signal acquisition across local and global networks. As sensor deployments densify and data volumes rise, classical threshold-trigger methods (e.g., STA/LTA) have been largely supplanted by advanced deep learning, feature-based, and hybrid systems capable of extracting discriminative patterns directly from continuous three-component waveforms or derived representations.

1. Problem Definition and Real-Time Constraints

Seismic signal/noise discrimination consists of distinguishing true local earthquake signals from other impulsive or continuous signals (instrumental, anthropogenic, teleseismic, meteorological, or environmental) within short time windows, often just a few seconds post-detection (Meier et al., 2019). In Earthquake Early Warning (EEW) settings, correct classification must be achieved with minimal waveform context (1–3 s), strict false-positive constraints, minimal missed detections, and computational latencies well below the arrival time of damaging shaking. Large-scale monitoring networks and cataloging further require the capability to generalize across stations, noise regimes, source mechanisms, and diverse ground coupling conditions (Magrini et al., 2020, Kharita et al., 27 Oct 2025).

Classical approaches (e.g., thresholding features such as peak ground velocity, STA/LTA ratios, or predominant period) are susceptible to false alarms from noise bursts, teleseisms, or mis-tuned parameters, with legacy systems reporting >34,000 false triggers in validation sets (Meier et al., 2019).

2. Discriminative Methodologies: Feature Engineering vs. Deep Learning

Feature-Based Approaches

Early machine learning classifiers for real-time discrimination leveraged engineered features (statistical moments, frequency ratios, spectral flux, and physics-informed attributes) computed on short windows after detection. Representative feature sets include:

TSFEL (time–spectral–wavelet features; >390 per trace)
Physics-informed (e.g., dominant frequency, centroid frequency, ascent/descent ratios, kurtosis, skewness, band-limited energy)
Scattering network coefficients from wavelet subbands (Kharita et al., 27 Oct 2025)

Random forests (RF) and gradient boosting trees have been found to achieve 87–89% balanced accuracy with these features, but are systematically outperformed by end-to-end deep learning models that exploit non-trivial time–frequency patterns in the raw data.

End-to-End Neural Methods

The most robust recent solutions employ convolutional, recurrent, or residual deep neural networks that learn to extract and combine multiscale, hierarchical features from waveform segments or spectrograms (Meier et al., 2019, Mousavi et al., 2018, Li et al., 2022, Kharita et al., 27 Oct 2025).

Table: Typical State-of-the-Art Deep Discriminators and Performance

Model	Input	Performance	Reference
SeismicCNN2D	3C Spectrogram	Accuracy > 92%	(Kharita et al., 27 Oct 2025)
QuakeXNet2D	3C Spectrogram	F1 ≈ 95% (network)	(Kharita et al., 27 Oct 2025)
CRED	3C Spectrogram	F1 = 99.95%	(Mousavi et al., 2018)
ResNet-1D	3C Time Series	F1 = 98.8%	(Li et al., 2022)
CNN (LEN-DB)	3C Time Series	Acc. = 93.2% (test)	(Magrini et al., 2020)

End-to-end approaches often input continuous waveform windows (e.g., 3×400 for 4 s at 100 Hz) or spectrograms (e.g., 3×129×38 bins). Key architectural choices are:

Stacked convolutional layers with batch normalization and ReLU activations.
Residual (skip) connections to mitigate vanishing gradients in very deep networks.
Combination with bidirectional LSTM/GRU units for sequence modeling (Mousavi et al., 2018, Birnie et al., 2020).
Multi-branch heads for hierarchical tasks (binary detection + phase classification) (Li et al., 2022).
Lightweight architectures (≤100k parameters) for embedded, ultra-fast inference (Kharita et al., 27 Oct 2025).

Output is typically a softmax or sigmoid probability; argmax or threshold-based classification is used, with thresholds tunable to operational FPR/TPR requirements.

3. Input Representations and Data Preprocessing

Input to these systems varies according to the deployment context:

Direct raw waveform slices (Z/N/E; typically 1–3 s windows) (Meier et al., 2019, Magrini et al., 2020).
Spectrogram or STFT matrices, with parameters (window length, hop, frequency range) selected for seismic bandwidth, e.g., STFT with Hann window, 256 samples, 50% overlap for 3C data (Kharita et al., 27 Oct 2025).
Filter bank/MFCC features (analogous to speech recognition) to encode spectral content in short frames, improving resilience to non-stationary noise (Mukherjee et al., 2021).
STA/LTA and frequency sub-band energy ratios for shallow feedforward networks (Paap et al., 2020).

All approaches require careful normalization: unit standard deviation, maximum absolute amplitude, or z-scoring by channel. Filtering (0.1–20 Hz bandpass) and windowing (tapered edges) are standard. Large, curated training sets with explicit event/noise labeling and cross-geographical splits are typical; for rare or new monitoring configurations, synthetic data generation and transfer learning are both used (Magrini et al., 2020, Solda et al., 31 Aug 2025).

4. Training, Decision Logic, and Real-Time Inference

Training Regimes

Supervised training is standard, using binary cross-entropy or categorical cross-entropy loss, Adam optimizer, and early stopping on validation loss. Batches of hundreds to thousands of windows per iteration are typical, with data balancing strategies (class weights, stratified sampling) to address event/noise imbalance (Paap et al., 2020, Magrini et al., 2020).

Augmentation protocols include random window shifting, phase perturbation, addition of Gaussian or field-like noise, and even synthetic composition of noise+signal overlays for coverage of extreme SNR regimes (Birnie et al., 2020, Mousavi et al., 2018).

Inference and Thresholding

For real-time operation, model outputs are thresholded (often τ=0.5) to control FPR/TPR. ROC or Precision-Recall analysis informs operational setting, and custom thresholds can be deployed to minimize missed detections or limit false positives under changing noise conditions (Paap et al., 2020, Mukherjee et al., 2021).

Sliding-window scanning with short (e.g., 0.5–4 s) overlaps ensures rapid response with low detection latency. Per-window inference time is sub-millisecond to tens of milliseconds, depending on architecture and hardware; even large models readily meet real-time scan rates at station or network scale (Mousavi et al., 2018, Kharita et al., 27 Oct 2025).

Edge deployment and resource constraint modes use quantized, pruned, or highly compressed architectures (down to 1.2 MB models or smaller; INT8 weights, sparse convolution support) (Kharita et al., 27 Oct 2025).

Multi-Station Logic

Passing per-station detections to a network-level association algorithm (e.g., requiring ≥ K coincident triggers) helps suppress isolated noise-induced detections and achieves reliable network-wide event confirmation (Shaheen et al., 2021).

5. Performance, Robustness, and Comparative Results

Machine learning and deep learning methods report superior discrimination relative to classical STA/LTA, OnSite, or template-matching baseline approaches across varied benchmarks:

Precision often exceeds 99%, recall 93–99%. Macro F1 for four-way discrimination (quake, explosion, surface event, noise) reaches 92–95% (Kharita et al., 27 Oct 2025).
False positive rates are reduced by orders of magnitude (0.48% vs. 45% for legacy OnSite on noise validation records; (Meier et al., 2019)).
In multi-level borehole arrays, CNNs exploiting moveout patterns yield precision 88.9% and recall 87%, with dramatic reduction of false alarms versus two-station STA/LTA (up to 917 false positives) (Shaheen et al., 2021).
Accuracy, robustness, and generalization remain high even on out-of-domain or geographically unseen test regions, provided the training set includes varied noise/event manifestations (Magrini et al., 2020, Kharita et al., 27 Oct 2025).
Classic feature-based classifiers plateau near 87–89% balanced accuracy and are less robust to SNR and source-type variability (Kharita et al., 27 Oct 2025).

Table: Comparative Model Performance (Selected Results)

Model	Precision	Recall	F1	Test Acc.	Ref
CRED	99.95%	>99%	99.95%	99.2%	(Mousavi et al., 2018)
RF (Phy)	~88%	~87%	87%	89%	(Kharita et al., 27 Oct 2025)
QuakeXNet2D	~95%	~95%	95%	92%	(Kharita et al., 27 Oct 2025)
CNN (LEN)	—	—	—	93.2%	(Magrini et al., 2020)

Latency is universally sub-second, with carefully optimized pipelines and dedicated hardware.

6. Reliability, Failure Modes, and Practical Recommendations

Model failure modes are dominated by:

Impulsive teleseismic phases with local body-wave features (Meier et al., 2019).
Noise signatures not represented in training (e.g., new cultural/industrial sources).
SNR below trained detection range (SNR < 0 dB in some architectures).
Highly overlapping or multi-event trace windows producing ambiguous spectral/temporal signatures (Solda et al., 31 Aug 2025).

Mitigation strategies include dynamic threshold adaptation, continuous retraining or fine-tuning with newly acquired noise data, transfer-learning initialization from larger datasets, and model ensembles (CNN + LSTM + feature-based) (Mukherjee et al., 2021).

For embedded or ultra-low-latency deployment, architectures are pruned and quantized; for large-scale field networks, optimized software pipelines (e.g., PyTorch/SeisBench integration) and station-level edge inference are standard (Kharita et al., 27 Oct 2025).

Practical guidance highlights:

Routine normalization and filtering, in accordance with trained data regime.
Careful validation on new deployments, with performance monitoring via confusion matrices, ROC, and PR curves (Paap et al., 2020).
Modular post-processing (e.g., event association, phase picking, catalog updating) following initial discrimination.
Continuous monitoring for domain-drift: monitor mask or probabilistic output statistics to detect changing noise or instrument regimes (Zhu et al., 2018).
Iterative retraining and data expansion to capture low-SNR events or new signal classes.

7. Future Developments and Expansion

Current trajectories include:

Unified frameworks coupling discrimination, phase picking, and location as multi-task deep models (Li et al., 2022).
Explicit multi-class (beyond binary) discrimination—separating tectonic, explosion, surface, and noise sources under operational constraints (Kharita et al., 27 Oct 2025).
Integration of non-seismic proximate sensors (infrasound, accelerometer arrays, image/vision sensors) and feature fusion for improved reliability (Mukherjee et al., 2021).
Fully synthetic training regimes for new deployments—using physics-based and site-specific modeling to address data scarcity (Solda et al., 31 Aug 2025).
Expandable models for global networks: station-agnostic, noise-robust, and auto-adaptive architectures (Magrini et al., 2020).

The combination of discriminative deep learning, robust feature engineering, continuous retraining, and careful pipeline engineering establishes reliable real-time seismic signal/noise discrimination as a tractable and scalable problem, central to current and future operational seismology.