Pangu-Weather Operational Insights

Updated 23 August 2025

Pangu-Weather Operational is a real-time deep learning weather forecasting system employing a 3D Earth-Specific Transformer to model global and regional dynamics.
Its workflow integrates deterministic and ensemble forecasts with rapid inference and comprehensive uncertainty quantification, achieving lower RMSE and higher ACC compared to conventional NWP models.
Despite competitive performance for common scenarios, challenges remain in accurately forecasting record-breaking extremes and detailed mesoscale structures, prompting hybrid model enhancements.

Pangu-Weather Operational refers to the real-time, production-ready implementation and evaluation of the Pangu-Weather model suite in global and regional weather forecasting. Pangu-Weather, initially introduced as a deep learning-based global weather prediction system with a 3D Earth-Specific Transformer (3DEST) architecture, rapidly advanced the state of data-driven meteorological models by outperforming leading numerical weather prediction (NWP) systems in short- and medium-range deterministic forecast skill at high spatiotemporal resolution. In operational settings, Pangu-Weather has demonstrated competitive performance for major forecast variables, real-time ensemble generation, process-based diagnostics, and integration with post-processing and uncertainty quantification frameworks. Nevertheless, limitations remain, especially in extrapolating beyond the training climatology and accurately forecasting record-breaking extremes or intricate mesoscale structures.

1. Model Architecture and Workflow

Pangu-Weather’s architecture is built around the 3D Earth-Specific Transformer (3DEST), which encodes the three-dimensional structure of the atmosphere by representing pressure-level information as a spatial cube. Inputs consist of upper-air data (organized as a tensor: 13 pressure levels × 1440 longitude × 721 latitude × 5 variables) and surface data, patch-embedded and concatenated along the height dimension. The core architecture utilizes an encoder–decoder transformer with eight-layer stacks and an Earth-specific positional bias (ESB) learned from absolute geospatial coordinates, thus accommodating the irregular projection of a sphere onto a grid and enabling the transformer to directly model vertical and latitude-dependent dependencies.

Operationally, a hierarchical temporal aggregation algorithm is applied: individual models are trained for different lead times (1 h, 3 h, 6 h, and 24 h) and then sequenced using a greedy coverage algorithm, minimizing iterative error propagation. This enables high-frequency (hourly) and extended (up to a week or longer) forecasts with a single, unified workflow. The complete operational stack is capable of both deterministic and ensemble forecast production with inference times as low as 1–4 seconds per step on a single GPU, making it suitable for real-time and ensemble-based applications (Bi et al., 2022, Cheng et al., 2023).

2. Accuracy, Skill Metrics, and Benchmarking

Pangu-Weather’s operational performance is routinely assessed with standard meteorological verification metrics:

Latitude-weighted Root Mean Square Error (RMSE):

${\rm RMSE}(v,t)=\sqrt{\frac{\sum_{i,j}L(i)(\hat{A}^{v}_{i,j,t}-A^{v}_{i,j,t})^2}{N_{\rm lat}\times N_{\rm lon}}}$

Anomaly Correlation Coefficient (ACC):

${\rm ACC}(v,t)=\frac{\sum_{i,j} L(i) \hat{A}^{\prime v}_{i,j,t} A^{\prime v}_{i,j,t}} {\sqrt{(\sum L(i)[\hat{A}^{\prime v}_{i,j,t}]^2) (\sum L(i)[A^{\prime v}_{i,j,t}]^2) }}$

These metrics are systematically used for both global grid and localized (e.g., station-level) evaluation.

Extensive benchmarking has demonstrated that Pangu-Weather achieves lower RMSE and higher ACC than operational IFS and other data-driven models (e.g., FourCastNet), particularly for standard mid-tropospheric and surface variables. At short to medium ranges (up to 7 days), deterministic accuracy is often significantly improved—mean RMSE reduction of over 10% in key variables—while maintaining high fidelity of synoptic-scale storm evolution including cyclone tracks (Bi et al., 2022, Cheng et al., 2023, Feng et al., 29 Apr 2024, DeMaria et al., 8 Sep 2024).

Process-based evaluations reveal that Pangu-Weather has encoded substantial aspects of physical dynamics, reproducing realistic Matsuno-Gill responses, baroclinic development, geostrophic adjustment, and hurricane genesis in controlled experiments, thus confirming that the model is not merely pattern-matching but capturing dynamical relationships (Hakim et al., 2023).

3. Ensemble Forecasting and Uncertainty Quantification

Operational ensemble forecasting with Pangu-Weather is enabled through multiple strategies:

Initial condition (IC) perturbation ensembles: Perturbing input states either via Gaussian noise, random field differences, or ECMWF ensemble-derived ICs; these generate forecast spread directly from the deterministic model at low computational cost (Bülte et al., 20 Mar 2024).
Arnoldi Singular Vector (A-SV) perturbations: Adjoint-free, model-consistent perturbations are constructed by using a Krylov subspace to identify directions of maximal forecast error growth in the full nonlinear model, producing physically relevant ensemble members for initializing uncertainty (Winkler et al., 13 Jun 2025).
Post-hoc UQ and lagged ensembles: Approaches such as isotonic regression (EasyUQ), distributional regression networks (DRN), and lagged deterministic ensembles are used to estimate the forecast PDF, evaluated by CRPS:

${\rm CRPS} = \int (F(x)-H(x-y))^2 dx$

This facilitates practical, well-calibrated probabilistic guidance: Pangu-Weather shows competitive or better probabilistic skill (CRPS) compared to ECMWF ensemble and GraphCast, especially at short to medium lead times (Brenowitz et al., 27 Jan 2024, Bülte et al., 20 Mar 2024).

Operational case studies with tropical cyclones demonstrate that AI-generated ensembles from Pangu-Weather can closely match the spatial uncertainty and probabilistic tracks of ECMWF ensembles, with rapid generation of thousands of scenarios for real-time risk analysis—unachievable by traditional NWP computation (Feng et al., 29 Apr 2024).

4. Applications and Integrations in Operational Forecasting

Pangu-Weather’s operational deployments span:

Real-time deterministic and ensemble global weather forecasting at 0.25° resolution, producing outputs for all major upper-level and surface variables and supporting daily to weekly guidance.
Extreme weather guidance, including tropical cyclone track forecasts: Pangu-Weather exhibits competitive skill in track prediction (high detection rates, consensus-based forecast improvements of up to 11%), although intensity forecasts are systematically too weak due to training on mean-square error and bias in ERA5 data (DeMaria et al., 8 Sep 2024).
Severe convective environment prediction: Medium-range forecasts of dynamically derived indices (e.g., CAPE, DLS) are on par with or exceed IFS skill, supporting rapid generation of outlooks for hazard-driven applications (Feldmann et al., 13 Jun 2024).
Regional and high-resolution adaptation: Variants of the architecture, with lower compute, have been deployed regionally (e.g., Indian monsoon) with robust skill (e.g., MAPE < 5%, FSS > 0.86 at short lead times) (Choudhury et al., 17 Mar 2025).

Additional operational improvements are realized via transformer-based post-processing, e.g., decoder-only transformers operating on sequential lead times, which yield large gains in Brier Skill for severe weather, especially when initialized from high-resolution analysis data (e.g., HRES or ERA5) (Hua et al., 16 May 2025). Feature attribution analysis supports model interpretability and real-time forecaster confidence.

5. Limitations and Challenges in Operational Use

Despite strengths, significant operational limitations remain:

Record-breaking extremes: Pangu-Weather operational systematically underpredicts both the intensity and occurrence frequency of out-of-sample, record-breaking temperature, wind, and heat/cold events. Bias increases monotonically with record exceedance margin, resulting in higher RMSE, lower recall, and a ‘soft-capping’ at values experienced within the training climatology (Zhang et al., 21 Aug 2025).
Intensity and mesoscale detail: For TCs, severe wind, and mesoscale frontal structures, the model tends to underpredict peak amplitudes and blurs sharp gradients compared to HRES and MEPS (Charlton-Perez et al., 2023, Xu et al., 22 Feb 2025, Bremnes et al., 2023). This is particularly acute in the context of extremes with complex tracks (e.g., sudden-turning typhoons) and high-impact storm surges.
Vertical/horizontal resolution tradeoffs: Constraints in vertical level representation limit fidelity for thermodynamic profiles (e.g., for CAPE estimation and near-surface heat extremes), with systematic biases propagating to forecasted environmental hazards (Feldmann et al., 13 Jun 2024, Ennis et al., 29 Apr 2025).
Extrapolation and physical constraints: AIWP models such as Pangu-Weather are prone to interpolation, lacking robust out-of-distribution generalization since extreme and unprecedented events lie outside the training envelope. Absence of built-in physical conservation in the data-driven framework can lead to systematic errors in high-impact regimes (Zhang et al., 21 Aug 2025).

6. Advancements, Hybridization, and Future Directions

Several developments address these operational gaps:

Spherical grids and transformer adaptivity: Emerging models (e.g., PEAR, HEAL-ViT) and regional adaptations use equal-area spherical meshes (HEALPix) or hierarchical attention to remove unphysical biases and improve both computational efficiency and skill, providing a pathway forward for operational deployments where traditional latitude–longitude limitations apply (Linander et al., 23 May 2025, Ramavajjala, 14 Feb 2024).
Uncertainty quantification and data assimilation: Arnoldi Singular Vector methods and advanced real-time data assimilation (ensemble score filter, transformer surrogates) improve the representation of uncertainty and rapid update capability for data-driven models, making operational integration more robust in turbulent scenarios (Winkler et al., 13 Jun 2025, Yin et al., 16 Jul 2024).
Hybrid and long-range operational pipelines: Efforts such as AtmosMJ show that robust long-term skill (months to year scale) can be achieved with innovations like gated residual fusion; hybrid modeling, where AI modules replace selected parameterizations in physical models, is a prospective solution for better extrapolation and stability (Cheon, 11 Jun 2025, Zhang et al., 21 Aug 2025).
Post-processing and bias correction: Transformer-based post-processing, bias correction, and ensemble calibration (e.g., via distributional regression or EasyUQ) demonstrably improve operational forecast reliability, particularly for high-impact variables and meteorologically rare events (Bülte et al., 20 Mar 2024, Hua et al., 16 May 2025, Bremnes et al., 2023).
Community and multi-source evaluations: Pangu-Weather operational systems are being tested alongside NWP in various operational meteorological centers and research programs, frequently integrated into consensus forecasting (e.g., in NHC hurricane guidance, yielding multi-year advances in track skill), though always with caution about limitations in record-breaking or sudden-turning event representation (DeMaria et al., 8 Sep 2024, Xu et al., 22 Feb 2025).

7. Summary Table: Pangu-Weather Operational Characteristics

Feature	Description	Operational Impact
Architecture	3D Earth-Specific Transformer, hierarchical aggregation	Fast, scalable, flexible inference
Deterministic forecast skill	Outperforms IFS in RMSE/ACC for most variables	High synoptic fidelity
Probabilistic skill (CRPS, ensemble)	Competitive with ECMWF ensemble & GraphCast	Enables operational UQ
Extreme event representation	Systematic intensity/frequency underestimation	Limitation in high-stakes contexts
Severe weather/hazard skill	High for CAPE/DLS, TC tracks; suboptimal for peak wind	Suitable for rapid outlooks
Data assimilation and IC dependence	High sensitivity to initial condition quality	Requires robust operational DA
Integration with post-processing	Transformer-based improves forecast discrimination	Enhances reliability, interpretability
Resource/cost efficiency	Seconds per forecast, ensemble feasible	Suits real-time and large ensembles
High-stakes operational deployment	Not yet suitable as sole tool for record extremes	Requires hybrid/backup NWP

Concluding Remarks

Pangu-Weather operational systems have transformed the landscape of medium-range meteorological forecasting, achieving rapid, accurate, and resource-efficient guidance for most common forecast situations. Ensemble production and advanced post-processing can further mitigate some systematic errors and provide actionable probabilistic information. However, thorough independent verification demonstrates persistent challenges in extrapolating to, and reliably characterizing, record-breaking and out-of-training-distribution extremes. For high-stakes decision-making—such as disaster preparedness and risk management—continued operational reliance on physics-based NWP, or hybrid model frameworks, remains warranted until AI forecast systems like Pangu-Weather can demonstrably match or exceed NWP performance in these critical regimes (Zhang et al., 21 Aug 2025, Charlton-Perez et al., 2023, Feng et al., 29 Apr 2024, Feldmann et al., 13 Jun 2024).