Identify the cause of degraded specific humidity (Q) performance under diversified pretraining configurations

Determine the causes underlying the observed degradation in specific humidity (Q) prediction accuracy at certain atmospheric pressure levels when the Aurora foundation model is pretrained using the diversified dataset configurations C3 and C4—where C3 augments ERA5 with IFS ensemble data, IFS HRES forecasts, and the IFS ensemble mean, and C4 provides broader atmospheric coverage—relative to the ERA5-only pretraining baseline configuration (C1).

Background

The paper investigates data diversity by pretraining Aurora on progressively richer dataset mixtures (C1–C4), ranging from ERA5-only (C1) to configurations that add climate simulations, IFS ensemble data, HRES forecasts, and broader atmospheric coverage. Most variables benefit from diversified pretraining, both in aggregate RMSE and extremes. However, the authors observe an exception for specific humidity (Q), where performance unexpectedly degrades for configurations C3 and C4 compared to ERA5-only (C1) on some levels.

This degradation potentially explains instances where Aurora underperforms GraphCast on Q at particular lead times and levels. The authors do not identify the mechanism driving this effect, making it a concrete unresolved question about data mixture effects on Q prediction and inviting focused analysis of dataset alignment, training duration, normalization, or variable weighting factors.

References

One interesting case is specific humidity (Q), where for unknown reasons, C3 and C4 perform significantly worse than the ERA5-pretrained model on some levels.

— A Foundation Model for the Earth System (2405.13063 - Bodnar et al., 20 May 2024) in Supplementary Materials, Section 'Effects of data diversity'

Identify the cause of degraded specific humidity (Q) performance under diversified pretraining configurations

Background

References

Related Problems