QuakeFlow: Scalable ML Seismic Monitoring
- QuakeFlow is a scalable, machine-learning-based earthquake monitoring pipeline that supports both real-time detection and archival reprocessing of petabyte-scale seismic data.
- Its modular microservice architecture leverages Kubernetes, Kafka, and Spark to enable low-latency processing and dynamic autoscaling for enhanced detection sensitivity.
- The system integrates advanced models like PhaseNet and GaMMA to generate high-resolution seismic catalogs, validated by extensive benchmarks in regions such as Puerto Rico and Hawai‘i.
QuakeFlow refers primarily to a scalable, machine-learning-based earthquake monitoring workflow, formalized in the cloud-native QuakeFlow framework for seismic data mining and real-time detection, and—distinctly in mathematics contexts—to Thurston’s earthquake flow on moduli spaces, as well as to computational models that capture rupture and seismic wave propagation in coupled solid–fluid media. The following focuses on the technical foundations, architecture, and performance of QuakeFlow as an earthquake monitoring pipeline, while drawing precise distinctions to the mathematical theory and computational modeling usages.
1. Motivation and End-to-End Objectives
QuakeFlow addresses the challenge posed by petabyte-scale, continuous seismic waveform archives generated by modern seismic networks, which exceed the capacity of traditional earthquake-monitoring workflows for both detection sensitivity and computational scalability. Conventional cataloging methods detect only a fraction of small-magnitude events and are prohibitively slow for large retrospective analyses. QuakeFlow establishes two principal objectives:
- Archival mining: Scalable reprocessing of long-term, continuous waveform datasets with machine learning, using parallel, cloud-based infrastructure to produce high-resolution seismic catalogs within hours.
- Real-time monitoring: Ingestion of live waveform streams, low-latency phase picking and association, and sub-second catalog updates for operational earthquake monitoring (Zhu et al., 2022).
2. System Architecture and Data-Flow
QuakeFlow is constructed as a modular, containerized microservice architecture orchestrated in Kubernetes. The framework comprises three interrelated processing pipelines:
- Archival (batch) pipeline: ObsPy fetches waveform data in time windows (e.g., daily) from seismic data centers, submits them as Kubernetes batch jobs to PhaseNet (CNN-based phase picker). PhaseNet pick outputs are passed to GaMMA (Gaussian Mixture Model Association) for earthquake association and preliminary magnitude estimation. Events may optionally be relocated (e.g., using HypoDD) before being written to archival data stores (GCS, Azure Blob, BigQuery, MongoDB).
- Streaming (real-time) pipeline: Live waveforms are ingested continuously via SeedLink clients to Kafka topics; Spark Structured Streaming windows and pre-processes data; PhaseNet runs as a FastAPI microservice for phase picking, with picks passed to Kafka; GaMMA (FastAPI microservice) performs association and magnitude estimation, with events written to relational or document stores for immediate dashboard visualization.
- Model training pipeline (Kubeflow): Data preparation and distributed deep learning occur in Kubernetes pods utilizing GPU/TPU resources, culminating in containerized, versioned models published to a registry for pipeline integration.
Containerization enables all components (PhaseNet, GaMMA, Spark, Kafka, database writers, training scripts) to be versioned and orchestrated independently, supporting both rapid workflow evolution and fault tolerance (Zhu et al., 2022).
3. Core Algorithmic Components
PhaseNet Deep Learning Phase Picker
PhaseNet is a fully convolutional encoder–decoder neural network with skip connections (U-Net-like). Input is (three-component waveforms over fixed time windows). The output at each time is a probability triplet , constrained to sum to 1. The model is trained on 700,000 manually picked arrivals, and optimized with a multiclass cross-entropy loss: Picks are extracted as local maxima in the , channels (Zhu et al., 2022).
GaMMA Phase Association and Event Creation
GaMMA formulates association as a probabilistic clustering problem. Given a set of picks from various stations, a mixture-of-Gaussians model is constructed: where (origin time), (hypocenter), and , are estimated using an expectation-maximization (EM) algorithm. The theoretical travel time is computed using a 1D velocity model. Magnitude is approximated by: with empirical constants, the largest pick amplitude, and the hypocentral distance (Zhu et al., 2022).
4. Cloud-Native Implementation and Scalability
QuakeFlow is orchestrated using Kubernetes, supporting cloud-agnostic deployment on GCP, AWS, Azure, or on-premises clusters. Key architecture features include:
- Batch job orchestration: Separate Kubernetes jobs for each time window enable embarrassingly parallel scale-out.
- Autoscaling: Horizontal Pod Autoscaler manages pod replication in response to CPU/memory demand; cluster-level autoscaling dynamically adjusts node counts, yielding linear throughput scaling and 70% wall-time reduction in stress tests (comparing auto-scaling to static clusters).
- Deployment practices: Component images are version-tagged, enabling rolling updates and minimal downtime; environment dependencies are centrally managed via requirements files (Zhu et al., 2022).
5. Real-Time Streaming and Data Integration
Streaming data flow is implemented with Apache Kafka as the message bus, partitioned into topics for raw waveforms, PhaseNet picks, and GaMMA-associated events. Spark Structured Streaming processes incoming waveform fragments in micro-batches, performs de-duplication and grouping, and pushes batches to PhaseNet inference endpoints. Downstream, event data are distributed to storage and visualization platforms supporting operational dashboards.
This architecture achieves sub-second end-to-end processing latency, with dashboards subscribing directly to event topics for immediate catalog and shake map updates (Zhu et al., 2022).
6. Empirical Performance and Case Studies
Significant empirical benchmarks include:
- Puerto Rico (2018–2021, 70 stations, 210 station-years): Acquisition and full reprocessing required 7 hours (up to 60 nodes used), costing \$40 (at GCP pricing). Detected events unambiguously exceeded (by factors$>$10, especially for$M<2.5$) the published regional catalog; spatial event distributions closely mapped active faults at finer resolution.
- Hawai‘i (HVO network): Similar-scale deployments yielded %%%%1819%%%% more events than the standard catalog, including deep mantle clusters and structures associated with magmatic pathways not previously resolved.
- Scaling: Throughput scaled linearly as node count increased; static clusters exhibited diminishing returns due to resource contention (Zhu et al., 2022).
7. Limitations, Extensions, and Best Practices
Current limitations include approximate magnitude estimation (improvable with amplitude-spectrum inversion), reliance on external relocation tools for high-precision event locations, and possible biases from using 1D velocity models in heterogeneous environments. Proposed extensions are seamless integration of alternative pickers (e.g., EQTransformer), deep-learning-based association modules, on-the-fly moment tensor estimation, and hybrid CPU/GPU/TPU acceleration.
Deployment recommendations emphasize semantic versioning of containers, separation of dev/prod namespaces, use of Infrastructure-as-Code, comprehensive resource/health monitoring, and data locality to minimize transfer costs. Rolling and canary deployment strategies are highlighted for robust pipeline updates (Zhu et al., 2022).
References:
- "QuakeFlow: A Scalable Machine-learning-based Earthquake Monitoring Workflow with Cloud Computing" (Zhu et al., 2022)
- Contextual distinction: see also mathematical earthquake flow (Wright, 2018); computational solid–fluid models (Roubíček et al., 2019).