Can AI weather models predict out-of-distribution gray swan tropical cyclones? (2410.14932v3)

Published 19 Oct 2024 in physics.ao-ph and cs.LG

Abstract: Predicting gray swan weather extremes, which are possible but so rare that they are absent from the training dataset, is a major concern for AI weather models and long-term climate emulators. An important open question is whether AI models can extrapolate from weaker weather events present in the training set to stronger, unseen weather extremes. To test this, we train independent versions of the AI model FourCastNet on the 1979-2015 ERA5 dataset with all data, or with Category 3-5 tropical cyclones (TCs) removed, either globally or only over the North Atlantic or Western Pacific basin. We then test these versions of FourCastNet on 2018-2023 Category 5 TCs (gray swans). All versions yield similar accuracy for global weather, but the one trained without Category 3-5 TCs cannot accurately forecast Category 5 TCs, indicating that these models cannot extrapolate from weaker storms. The versions trained without Category 3-5 TCs in one basin show some skill forecasting Category 5 TCs in that basin, suggesting that FourCastNet can generalize across tropical basins. This is encouraging and surprising because regional information is implicitly encoded in inputs. Given that current state-of-the-art AI weather and climate models have similar learning strategies, we expect our findings to apply to other models. Other types of weather extremes need to be similarly investigated. Our work demonstrates that novel learning strategies are needed for AI models to reliably provide early warning or estimated statistics for the rarest, most impactful TCs, and, possibly, other weather extremes.

Summary

The paper demonstrates that AI weather models, exemplified by FourCastNet, fail to accurately predict intense Category 5 tropical cyclones when trained without them.
It shows that models missing regional cyclone data can still exhibit some forecasting skill for similar events within that specific basin.
The study underscores the need to integrate physical laws and synthetic data to enhance AI predictions for unprecedented extreme weather events.

AI Weather Models and Gray Swan Prediction: Analyzing Extrapolation Limitations in Tropical Cyclones

The paper titled "Can AI weather models predict out-of-distribution gray swan tropical cyclones?" investigates the capacity of AI-driven weather models to predict unprecedented extreme weather events, particularly Category 5 tropical cyclones (TCs) not seen in the training data. This research employs FourCastNet, a state-of-the-art AI weather model, to assess its extrapolative capabilities and limitations in predicting these rare, high-impact events known as "gray swans."

Methodology Overview

The authors utilize the ERA5 dataset spanning from 1979 to 2015, supplemented by data from 2018 to 2023 for testing. They create several versions of the FourCastNet model, each trained on datasets with different configurations:

Full Dataset: Includes all weather events.
noTC Dataset: Excludes samples with Category 3-5 TCs.
Rand Dataset: Matches the size of noTC but retains all Category 3-5 TCs, randomly removing other samples.
noWP and noNA Datasets: Remove Category 3-5 TCs specifically from the Western Pacific and North Atlantic basins, respectively.

The models are then evaluated on their ability to forecast Category 5 TCs during the testing period.

Key Results

Extrapolation Limitations:
- FourCastNet models trained without Category 3-5 TCs (noTC) fail to predict the intensity of Category 5 TCs accurately. The forecasts produce negligible minimum sea-level pressure (mslp) reduction, resulting in predictions of weaker storm events than observed.
Intra-basin Generalization:
- Models trained without TCs from one basin (noWP or noNA) surprisingly exhibit some skill in forecasting Category 5 TCs in that specific basin. This suggests potential for regional generalization based on similar dynamics across oceans.
Physical Consistency:
- None of the versions satisfy gradient-wind balance, a critical physical constraint for TCs. This lack of physical congruence was consistent despite training variations, reflecting a key limitation in AI models' understanding of underlying physics for extreme events.
Global Weather Forecast Performance:
- All versions display comparable accuracy for global weather prediction, underscoring that common performance metrics might mask failures in extreme situation predictions.

Implications and Future Directions

This analysis has critical implications for AI weather modeling. The demonstrated inability to extrapolate to out-of-distribution gray swan events presents a challenge for operational reliance on AI weather models in predicting unprecedented natural disasters. The paper suggests that AI models need innovative learning strategies to improve accuracy for rare, high-impact events.

Proposed Remedies and Future Work

To enhance AI weather models:

Integrate physical laws and constraints into the models to ensure physical consistency and potentially enhance out-of-distribution generalization.
Employ data augmentation techniques using synthetic data from theoretical models to fill the training set with diverse, extreme scenarios, aiding the model in learning dynamic principles.
Combine AI models with numerical weather models for a hybrid approach that leverages strengths of both systems, potentially guided by rare-event simulations.

The paper calls for rigorous evaluation protocols in assessing AI predictions of gray swans, emphasizing the necessity of defining proper metrics and approaches to accurately determine model robustness. The insights gleaned here are not only pivotal for TCs but apply broadly to other extreme weather events and AI-driven climate modeling. By pursuing these approaches, AI weather models could better anticipate and quantify the effects of extreme and unprecedented weather, contributing to improved preparedness and response strategies.

PDF Markdown

Tweets

https://twitter.com/RogerPielkeJr/status/1850955919724855728

https://twitter.com/weerrecords/status/1850972297206571377

https://twitter.com/OceanicPhysics/status/1923345274807677426