Identify the cause of degraded specific humidity (Q) performance under diversified pretraining configurations
Determine the causes underlying the observed degradation in specific humidity (Q) prediction accuracy at certain atmospheric pressure levels when the Aurora foundation model is pretrained using the diversified dataset configurations C3 and C4—where C3 augments ERA5 with IFS ensemble data, IFS HRES forecasts, and the IFS ensemble mean, and C4 provides broader atmospheric coverage—relative to the ERA5-only pretraining baseline configuration (C1).
References
One interesting case is specific humidity (Q), where for unknown reasons, C3 and C4 perform significantly worse than the ERA5-pretrained model on some levels.
                — A Foundation Model for the Earth System
                
                (2405.13063 - Bodnar et al., 20 May 2024) in Supplementary Materials, Section 'Effects of data diversity'