QuakeFlow: Scalable ML Seismic Monitoring

Updated 3 February 2026

QuakeFlow is a scalable, machine-learning-based earthquake monitoring pipeline that supports both real-time detection and archival reprocessing of petabyte-scale seismic data.
Its modular microservice architecture leverages Kubernetes, Kafka, and Spark to enable low-latency processing and dynamic autoscaling for enhanced detection sensitivity.
The system integrates advanced models like PhaseNet and GaMMA to generate high-resolution seismic catalogs, validated by extensive benchmarks in regions such as Puerto Rico and Hawai‘i.

QuakeFlow refers primarily to a scalable, machine-learning-based earthquake monitoring workflow, formalized in the cloud-native QuakeFlow framework for seismic data mining and real-time detection, and—distinctly in mathematics contexts—to Thurston’s earthquake flow on moduli spaces, as well as to computational models that capture rupture and seismic wave propagation in coupled solid–fluid media. The following focuses on the technical foundations, architecture, and performance of QuakeFlow as an earthquake monitoring pipeline, while drawing precise distinctions to the mathematical theory and computational modeling usages.

1. Motivation and End-to-End Objectives

QuakeFlow addresses the challenge posed by petabyte-scale, continuous seismic waveform archives generated by modern seismic networks, which exceed the capacity of traditional earthquake-monitoring workflows for both detection sensitivity and computational scalability. Conventional cataloging methods detect only a fraction of small-magnitude events and are prohibitively slow for large retrospective analyses. QuakeFlow establishes two principal objectives:

Archival mining: Scalable reprocessing of long-term, continuous waveform datasets with machine learning, using parallel, cloud-based infrastructure to produce high-resolution seismic catalogs within hours.
Real-time monitoring: Ingestion of live waveform streams, low-latency phase picking and association, and sub-second catalog updates for operational earthquake monitoring (Zhu et al., 2022).

2. System Architecture and Data-Flow

QuakeFlow is constructed as a modular, containerized microservice architecture orchestrated in Kubernetes. The framework comprises three interrelated processing pipelines:

Archival (batch) pipeline: ObsPy fetches waveform data in time windows (e.g., daily) from seismic data centers, submits them as Kubernetes batch jobs to PhaseNet (CNN-based phase picker). PhaseNet pick outputs are passed to GaMMA (Gaussian Mixture Model Association) for earthquake association and preliminary magnitude estimation. Events may optionally be relocated (e.g., using HypoDD) before being written to archival data stores (GCS, Azure Blob, BigQuery, MongoDB).
Streaming (real-time) pipeline: Live waveforms are ingested continuously via SeedLink clients to Kafka topics; Spark Structured Streaming windows and pre-processes data; PhaseNet runs as a FastAPI microservice for phase picking, with picks passed to Kafka; GaMMA (FastAPI microservice) performs association and magnitude estimation, with events written to relational or document stores for immediate dashboard visualization.
Model training pipeline (Kubeflow): Data preparation and distributed deep learning occur in Kubernetes pods utilizing GPU/TPU resources, culminating in containerized, versioned models published to a registry for pipeline integration.

Containerization enables all components (PhaseNet, GaMMA, Spark, Kafka, database writers, training scripts) to be versioned and orchestrated independently, supporting both rapid workflow evolution and fault tolerance (Zhu et al., 2022).

3. Core Algorithmic Components

PhaseNet Deep Learning Phase Picker

PhaseNet is a fully convolutional encoder–decoder neural network with skip connections (U-Net-like). Input is $\mathbf{x} \in \mathbb{R}^{T\times 3}$ (three-component waveforms over fixed time windows). The output at each time is a probability triplet $[\hat y_P(t), \hat y_S(t), \hat y_N(t)]$ , constrained to sum to 1. The model is trained on $\sim$ 700,000 manually picked arrivals, and optimized with a multiclass cross-entropy loss: $L = -\frac{1}{T}\sum_{t=1}^{T} \sum_{c\in\{P,S,N\}} y_c(t)\,\log \hat y_c(t)$ Picks are extracted as local maxima in the $p_P(t)$ , $p_S(t)$ channels (Zhu et al., 2022).

GaMMA Phase Association and Event Creation

GaMMA formulates association as a probabilistic clustering problem. Given a set of picks $t_i$ from various stations, a mixture-of-Gaussians model is constructed: $p(\{t_i\}\mid \Theta) = \sum_{k=1}^{K} \pi_k \prod_{i=1}^{M} \mathcal{N}(t_i \mid t_{0,k} + T(s_i, x_k), \sigma_k^2)$ where $t_{0,k}$ (origin time), $x_k$ (hypocenter), and $[\hat y_P(t), \hat y_S(t), \hat y_N(t)]$ 0, $[\hat y_P(t), \hat y_S(t), \hat y_N(t)]$ 1 are estimated using an expectation-maximization (EM) algorithm. The theoretical travel time $[\hat y_P(t), \hat y_S(t), \hat y_N(t)]$ 2 is computed using a 1D velocity model. Magnitude is approximated by: $[\hat y_P(t), \hat y_S(t), \hat y_N(t)]$ 3 with empirical constants, $[\hat y_P(t), \hat y_S(t), \hat y_N(t)]$ 4 the largest pick amplitude, and $[\hat y_P(t), \hat y_S(t), \hat y_N(t)]$ 5 the hypocentral distance (Zhu et al., 2022).

4. Cloud-Native Implementation and Scalability

QuakeFlow is orchestrated using Kubernetes, supporting cloud-agnostic deployment on GCP, AWS, Azure, or on-premises clusters. Key architecture features include:

Batch job orchestration: Separate Kubernetes jobs for each time window enable embarrassingly parallel scale-out.
Autoscaling: Horizontal Pod Autoscaler manages pod replication in response to CPU/memory demand; cluster-level autoscaling dynamically adjusts node counts, yielding linear throughput scaling and $[\hat y_P(t), \hat y_S(t), \hat y_N(t)]$ 670% wall-time reduction in stress tests (comparing auto-scaling to static clusters).
Deployment practices: Component images are version-tagged, enabling rolling updates and minimal downtime; environment dependencies are centrally managed via requirements files (Zhu et al., 2022).

5. Real-Time Streaming and Data Integration

Streaming data flow is implemented with Apache Kafka as the message bus, partitioned into topics for raw waveforms, PhaseNet picks, and GaMMA-associated events. Spark Structured Streaming processes incoming waveform fragments in micro-batches, performs de-duplication and grouping, and pushes batches to PhaseNet inference endpoints. Downstream, event data are distributed to storage and visualization platforms supporting operational dashboards.

This architecture achieves sub-second end-to-end processing latency, with dashboards subscribing directly to event topics for immediate catalog and shake map updates (Zhu et al., 2022).

6. Empirical Performance and Case Studies

Significant empirical benchmarks include:

Puerto Rico (2018–2021, 70 stations, 210 station-years): Acquisition and full reprocessing required $[\hat y_P(t), \hat y_S(t), \hat y_N(t)]$ 77 hours (up to 60 nodes used), costing $[\hat y_P(t), \hat y_S(t), \hat y_N(t)]$ 8\$[\hat y_P(t), \hat y_S(t), \hat y_N(t)]$9>$\sim$0M<2.5$) the published regional catalog; spatial event distributions closely mapped active faults at finer resolution.
Hawai‘i (HVO network): Similar-scale deployments yielded %%%%18 $[\hat y_P(t), \hat y_S(t), \hat y_N(t)]$ 219%%%% more events than the standard catalog, including deep mantle clusters and structures associated with magmatic pathways not previously resolved.
Scaling: Throughput scaled linearly as node count increased; static clusters exhibited diminishing returns due to resource contention (Zhu et al., 2022).

7. Limitations, Extensions, and Best Practices

Current limitations include approximate magnitude estimation (improvable with amplitude-spectrum inversion), reliance on external relocation tools for high-precision event locations, and possible biases from using 1D velocity models in heterogeneous environments. Proposed extensions are seamless integration of alternative pickers (e.g., EQTransformer), deep-learning-based association modules, on-the-fly moment tensor estimation, and hybrid CPU/GPU/TPU acceleration.

Deployment recommendations emphasize semantic versioning of containers, separation of dev/prod namespaces, use of Infrastructure-as-Code, comprehensive resource/health monitoring, and data locality to minimize transfer costs. Rolling and canary deployment strategies are highlighted for robust pipeline updates (Zhu et al., 2022).

References:

"QuakeFlow: A Scalable Machine-learning-based Earthquake Monitoring Workflow with Cloud Computing" (Zhu et al., 2022)
Contextual distinction: see also mathematical earthquake flow (Wright, 2018); computational solid–fluid models (Roubíček et al., 2019).

Markdown Report Issue Upgrade to Chat

References (3)

QuakeFlow: A Scalable Machine-learning-based Earthquake Monitoring Workflow with Cloud Computing (2022)

Mirzakhani's work on earthquake flow (2018)

A monolithic model for phase-field fracture and waves in solid-fluid media towards earthquakes (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to QuakeFlow.

QuakeFlow: Scalable ML Seismic Monitoring

1. Motivation and End-to-End Objectives

2. System Architecture and Data-Flow

3. Core Algorithmic Components

PhaseNet Deep Learning Phase Picker

GaMMA Phase Association and Event Creation

4. Cloud-Native Implementation and Scalability

5. Real-Time Streaming and Data Integration

6. Empirical Performance and Case Studies

7. Limitations, Extensions, and Best Practices

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

QuakeFlow: Scalable ML Seismic Monitoring

1. Motivation and End-to-End Objectives

2. System Architecture and Data-Flow

3. Core Algorithmic Components

PhaseNet Deep Learning Phase Picker

GaMMA Phase Association and Event Creation

4. Cloud-Native Implementation and Scalability

5. Real-Time Streaming and Data Integration

6. Empirical Performance and Case Studies

7. Limitations, Extensions, and Best Practices

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research