CycloneMAE: A Scalable Multi-Task Learning Model for Global Tropical Cyclone Probabilistic Forecasting

Published 14 Apr 2026 in cs.LG and cs.AI | (2604.12180v1)

Abstract: Tropical cyclones (TCs) rank among the most destructive natural hazards, yet their forecasting faces fundamental trade-offs: numerical weather prediction (NWP) models are computationally prohibitive and struggle to leverage historical data, while existing deep learning (DL)-based intelligent models are variable-specific and deterministic, which fail to generalize across different forecasting variables. Here we present CycloneMAE, a scalable multi-task forecasting model that learns transferable TC representations from multi-modal data using a TC structure-aware masked autoencoder. By coupling a discrete probabilistic gridding mechanism with a pre-train/fine-tune paradigm, CycloneMAE simultaneously delivers deterministic forecasts and probability distributions. Evaluated across five global ocean basins, CycloneMAE outperforms leading NWP systems in pressure and wind forecasting up to 120 hours and in track forecasting up to 24 hours. Attribution analysis via integrated gradients reveals physically interpretable learning dynamics: short-term forecasts rely predominantly on the internal core convective structure from satellite imagery, whereas longer-term forecasts progressively shift attention to external environmental factors. Our framework establishes a scalable, probabilistic, and interpretable pathway for operational TC forecasting.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper introduces CycloneMAE, a multi-task masked autoencoder that provides scalable, probabilistic tropical cyclone forecasts with significant error reductions compared to operational NWP models.
It employs a novel radial distance masking strategy and a hybrid training pipeline—combining pre-training and fine-tuning—to effectively capture multiscale cyclone dynamics.
The model offers physically interpretable uncertainty quantification through integrated gradients, enabling robust risk assessment over varying forecast lead times.

CycloneMAE: Scalable Multi-Task Probabilistic Learning for Global Tropical Cyclone Forecasting

Introduction

Tropical cyclones (TCs) present complex multivariate prediction challenges due to their multi-scale dynamical nature and catastrophic impacts. Current operational approaches rely on numerical weather prediction (NWP) systems, which are computationally intensive and constrained in their data efficiency and uncertainty estimation capability. Recent advances in deep learning (DL) models have provided substantial improvements in deterministic prediction, but have not demonstrated strong generalizability across multiple forecast variables, nor provided interpretable uncertainty quantification critical for risk assessment.

CycloneMAE introduces a multi-task masked autoencoder (MAE) approach for TC prediction, exploiting multi-modal global data to provide deterministic and probabilistic forecasts with physical interpretability. The model leverages specialized architectural innovations and attribution analysis to address key generalization and operational challenges in global TC forecasting (2604.12180).

Model Architecture and Training Paradigm

CycloneMAE employs a pre-training/fine-tuning paradigm to construct transferable, structure-aware representations for TCs.

Figure 1: CycloneMAE’s training pipeline. The structure-aware MAE leverages multi-modal satellite and reanalysis fields in pre-training (left) and switches to fine-tuning with task-specific heads for probabilistic forecasting (right).

During pre-training, satellite IR/WV imagery, ERA5 reanalysis fields, and TC attributes are spatially aligned and transformed into non-overlapping patches. A novel radial distance masking strategy preferentially masks peripheral regions, enforcing the model to internalize core structural dynamics central to TC evolution. Separate encoders (ViT-based) for each modality process unmasked patches, subsequently fused and decoded in a reconstruction objective. This process induces a physically consistent representation aligned with the heterogeneous spatial structure of TCs.

Fine-tuning freezes the encoders and attaches variable-specific forecasting heads. Temporal context is injected by processing sequences of past observations for each variable (MSLP, MSW, track) via LSTMs, with probabilistic heads projecting into discretized output spaces and yielding both expected and distributional predictions. Gaussian label smoothing is applied to mitigate quantization error, and the loss is cross-entropy between target and predicted distributions.

Data and Evaluation Protocol

CycloneMAE is trained on 20 years (2000–2019) of global TC records spanning five primary basins (WP, NA, EP, SI, SP) using GridSat-B1 satellite, ERA5 reanalysis, and best track datasets. Pre-training uses 2000–2014, and fine-tuning uses 2015–2019, with comprehensive spatiotemporal matching and normalizations performed across all inputs.

Evaluation is conducted on independent data from 2020–2024, using operational TIGGE analysis for environmental fields (to ensure parity with NWP baselines). Performance metrics focus on mean absolute error (MAE) for MSLP, MSW, and track across multiple lead times, consistent with operational TC forecast evaluation.

Main Results

Comparative Performance with Operational NWP

CycloneMAE is consistently superior to ECMWF-IFS, NCEP-GFS, and CMA-GFS for intensity (MSLP/MSW) prediction up to 120 hours and for track prediction up to 24 hours, across all major basins.

Figure 2: Mean absolute error comparison in WP, EP, and NA between CycloneMAE and leading global NWPs for MSLP, MSW, and track.

Quantitatively, at 120-hour lead times, MSLP error improvements over ECMWF-IFS are 18.57% for WP, 10.68% for EP, and 13.19% for NA. MSW errors are similarly reduced (up to 20.24%). For 24-hour track forecasts, CycloneMAE surpasses NCEP-GFS by 9.13% in WP, up to 13.83% in NA. For longer-term track forecasts, NWP models regain their advantage due to their superior global dynamical coverage.

Year-on-year analysis in WP basin confirms temporal robustness and stability.

Figure 3: Year-wise MAE from 2020–2024 in WP for MSLP, MSW, and track, demonstrating CycloneMAE’s consistently low errors except for track at long leads.

In the Southern Hemisphere, performance remains consistent, with lower or comparable errors to leading NWPs for both MSW/MSLP and track, except that track skill deteriorates more rapidly beyond 48 hours.

Probabilistic Forecast Capability and Uncertainty Estimation

CycloneMAE’s discrete probabilistic output enables empirical uncertainty quantification analogous to ensemble NWP systems but at a significantly reduced computational cost.

Figure 4: Case study analysis of four major TCs showing deterministic evolution (top) and lead-time-dependent forecast distributions (bottom), illustrating model confidence and sharpness.

Full-cycle case analyses (e.g., In-fa, Doksuri, Earl, Teddy) illustrate high-fidelity tracking of intensity and structure, accurate capture of rapid intensification and decay, and well-calibrated uncertainty budgets—intervals expand with lead time and reflect both inherent and event-specific predictability. Slight underestimation of extreme peaks is isolated, consistent with other DL systems.

Physically Interpretable Attribution Analysis

Integrated Gradients (IG) attribution quantifies the contribution of each input predictor over lead time for each forecast target.

Figure 5: IG-based decomposition of predictor importance for MSW, MSLP, and track with evolving emphasis on core versus environmental variables.

Short-term (≤24h) intensity forecasts are dominated by satellite IR/WV and MSL predictors, highlighting the learned reliance on core convective structure and thermodynamic features, while track prediction always emphasizes large-scale environmental fields. For longer leads, sensitivity shifts systematically to upper-tropospheric predictors (e.g., Z200), consistent with dynamical theory of TC steering and large-scale modulation. This attribution provides mechanistic validation of CycloneMAE’s learning process and its alignment with physical principles.

Global Performance Visualization

Figure 6: Global map of observed (green) and forecasted (purple) tracks and MSW for all TCs 2020–2024 at 6-hour lead, demonstrating the high spatial agreement and forecast realism across all basins.

The spatial congruence between observed and forecasted tracks further attests to the operational feasibility and multiscale skill of CycloneMAE.

Discussion and Future Directions

CycloneMAE substantially advances DL-based TC forecasting by unifying multi-task learning, probabilistic output, attribution, and efficient operational deployment. Its probabilistic gridding framework obviates the need for resource-intensive NWP ensembles, offering real-time uncertainty-aware guidance. The model’s physically coherent attributions suggest potential for integration with physics-based schemes, and its architecture facilitates adaptation to extended variables or incorporation of new data sources.

A consistent limitation remains in long-range track prediction, reflecting the challenge of local receptive field architectures to capture planetary-scale teleconnections. Future work should focus on hybrid systems that inject global context—potentially using coupling with global NWP or foundation models—enabling both local physical resolution and global dynamical awareness.

Conclusion

CycloneMAE establishes a scalable, multi-modal, physically interpretable framework for global, multi-variable TC probabilistic forecasting. It decisively improves upon state-of-the-art NWP and DL models in operationally relevant lead times and offers built-in uncertainty estimation and diagnostics. These contributions represent a significant step towards next-generation, efficient, and robust disaster mitigation systems for extreme weather phenomena.

(2604.12180)

Markdown Report Issue