A Benchmark Dataset for Tornado Detection and Prediction using Full-Resolution Polarimetric Weather Radar Data

Published 26 Jan 2024 in physics.ao-ph and cs.LG | (2401.16437v1)

Abstract: Weather radar is the primary tool used by forecasters to detect and warn for tornadoes in near-real time. In order to assist forecasters in warning the public, several algorithms have been developed to automatically detect tornadic signatures in weather radar observations. Recently, Machine Learning (ML) algorithms, which learn directly from large amounts of labeled data, have been shown to be highly effective for this purpose. Since tornadoes are extremely rare events within the corpus of all available radar observations, the selection and design of training datasets for ML applications is critical for the performance, robustness, and ultimate acceptance of ML algorithms. This study introduces a new benchmark dataset, TorNet to support development of ML algorithms in tornado detection and prediction. TorNet contains full-resolution, polarimetric, Level-II WSR-88D data sampled from 10 years of reported storm events. A number of ML baselines for tornado detection are developed and compared, including a novel deep learning (DL) architecture capable of processing raw radar imagery without the need for manual feature extraction required for existing ML algorithms. Despite not benefiting from manual feature engineering or other preprocessing, the DL model shows increased detection performance compared to non-DL and operational baselines. The TorNet dataset, as well as source code and model weights of the DL baseline trained in this work, are made freely available.

Abstract PDF HTML Upgrade to Chat

Authors (6)

References (3)

Summary

The paper presents the TorNet dataset, a benchmark composed of over 200,000 full-resolution radar samples capturing diverse storm conditions for tornado detection.
The study evaluates several ML models, including a novel CNN with CoordConv, showing improved accuracy, precision, and AUC over traditional methods.
The research facilitates reproducible evaluations and advances ML-driven prediction, setting the stage for future innovations in tornado forecasting.

An Overview of TorNet: A Benchmark Dataset for Tornado Detection and Prediction

The development of datasets is a foundational task in designing robust ML algorithms, particularly in domains characterized by rare phenomena, such as tornado detection using meteorological radar data. The paper "A Benchmark Dataset for Tornado Detection and Prediction using Full-Resolution Polarimetric Weather Radar Data" introduces TorNet, a meticulously curated benchmark dataset specifically designed for advancing ML applications in detecting and predicting tornadoes. This essay provides an in-depth examination of the dataset, the baseline models introduced within the study, and the implications for future research.

TorNet Dataset Composition

TorNet leverages 10 years of full-resolution, polarimetric Level-II Weather Surveillance Radar-1988 Doppler (WSR-88D) data to form a comprehensive dataset for tornado-related ML research. Comprising over 200,000 samples, TorNet includes instances of active tornado-producing storms, non-tornadic rotating storms, severe non-rotating storms, and benign weather conditions to provide a balanced training ground for classification algorithms.

The dataset is structured into three main categories: confirmed tornadoes, non-tornadic storms with tornado warnings, and non-warned random storms. These categories were carefully selected to encompass a wide range of storm intensities and morphological characteristics, providing a broad platform for algorithmic evaluation and development.

ML Baselines and Performance

The study compares several machine learning models to assess their performance on the TorNet dataset, establishing baselines for future research. Four primary models are considered:

Tornado Vortex Signature (TVS): This operational algorithm uses radar parameters like radial velocity to infer tornado presence. Despite its operational use, it is static and lacks adaptability offered by ML models.
Logistic Regression and Random Forest: Utilizing predictive features derived from azimuthal shear and other radar variables, these models provide a foundational performance metric. While the Random Forest model slightly outperforms logistic regression, both exhibit significant improvements over the TVS algorithm.
Convolutional Neural Network (CNN): This deep learning model, capable of raw radar data ingestion, shows superior performance across numerous metrics. It exploits high-dimensional radar data using a novel architecture that incorporates CoordConv operations, which accommodate the range-angle nature of radar data. Performance is evidenced by increased accuracy, precision, and Area Under Curve (AUC) scores compared to baseline algorithms, indicating potential for significant advancements in real-time tornado warning systems.

Implications for Tornado Detection and Prediction

The introduction of the TorNet dataset marks a significant step in the integration of advanced ML techniques within operational meteorology. The availability of such a benchmark enables reproducibility and facilitates the fair comparison of emerging methodologies. This is crucial in advancing automated tornado detection systems, potentially leading to reduced false alarm rates and improved lead times for public warnings.

Beyond detection, the potential application of TorNet extends to the prediction of tornado genesis, a task requiring the forecasting of tornado formation from storm precursors. The inclusion of radar data capturing pre-tornadic conditions provides vital information for training models capable of predicting these high-impact events.

Future Directions

While the results are promising, several avenues for future research are suggested. These include extending the dataset to incorporate additional radar tilts and time periods, as well as multi-modal data fusion involving satellite imagery, lightning information, and output from numerical weather prediction (NWP) models. Such proposals aim to enhance the robustness and accuracy of detection and prediction models.

Additionally, the paper underscores the potential for expanding research into the temporal dynamics of tornado detection and prediction. The prospective use of TorNet for training predictive models that anticipate tornado development in lead times exceeding current capabilities signals a crucial area for breakthrough innovations in meteorology.

In conclusion, the study presents TorNet as a valuable resource for ML-driven tornado detection and prediction research. The dataset's design and open accessibility are poised to accelerate advancements in weather forecasting science, consequently offering societal benefits in the form of improved forecasting accuracy and effectiveness.

Markdown Report Issue