OmniCast: Unified Weather Forecasting
- OmniCast is a probabilistic deep learning model that unifies medium-range and subseasonal-to-seasonal weather forecasting using a low-dimensional VAE and masked latent diffusion within a transformer.
- It overcomes traditional autoregressive limitations with an iterative unmasking algorithm, reducing error accumulation and achieving 10–20× faster inference.
- Empirical results show state-of-the-art forecast skill, superior probabilistic metrics, and century-scale climatological stability.
OmniCast is a probabilistic deep learning model designed for unified weather forecasting across time scales, spanning both medium-range and subseasonal-to-seasonal (S2S) horizons. It addresses limitations of traditional autoregressive machine learning weather models, notably compounding error and scalability, by operating in a low-dimensional latent space and employing a masked latent diffusion scheme within a transformer-based architecture. OmniCast demonstrates competitive or state-of-the-art performance on accuracy, physics-based, and probabilistic metrics, while providing substantially faster inference and stable long-term forecasts (Nguyen et al., 20 Oct 2025).
1. Core Architecture and Mathematical Formulation
OmniCast comprises two principal components: a variational autoencoder (VAE) and a diffusion-based transformer.
VAE Encoder-Decoder:
The VAE learns a per-frame mapping from raw weather fields —where is the variable dimension—to a latent map with dimensionality reduction (, ). The dimension is typically set to 1024 (S2S) or 256 (medium-range). The VAE optimization minimizes the evidence lower bound (ELBO): A convolutional UNet serves as the backbone, and optimization employs Adam with specified hyperparameters.
Diffusion-Based Transformer:
The transformer operates on sequences of future latent tokens, indexed as with , where is the forecast horizon. The initial VAE latent embedding of the first frame serves as the conditioning context 0. A bidirectional encoder-decoder transformer (16 layers, 16 heads, 1024 hidden units, dropout 0.1) produces per-token contextual vectors 1.
Each token is further modeled by a diffusion head, where the forward process is given by: 2 with 3 defining a linear noise schedule (4 for training, 5 for inference). The diffusion loss is: 6 Sampling during inference proceeds via reverse diffusion, controlled by a temperature parameter 7.
2. Training Strategy and Loss Functions
Training involves random corruption of future latent tokens and a combined generative-deterministic objective.
Random Masking and Joint Diffusion:
A mask ratio 8 determines the Bernoulli mask 9, corrupting the future sequence: 0 The generative objective is: 1
Auxiliary Deterministic Loss:
An MSE head is defined for the first ten future frames, with a weight decaying exponentially: 2 where 3 for frame index 4, zero otherwise.
Global Objective:
5
Training uses AdamW with specified learning rate, betas, weight decay, batch size 32, 100 epochs, with 10-epoch warmup and cosine decay.
3. Inference Algorithm and Error Mitigation
OmniCast avoids standard autoregressive prediction by using masked latent diffusion joint sampling. At inference, all spatial locations and time frames are generated through an iterative space–time unmasking algorithm.
Iterative Unmasking (pseudocode):
2 Joint prediction prevents monotonic error accumulation typical of autoregressive approaches. Randomized unmasking schedules confer diversity and allow the model to use both future and past context per token, mitigating temporal drift.
4. Empirical Performance and Analysis
Subseasonal-to-Seasonal (S2S) Forecasting:
- Dataset: ERA5, 69 variables, 6 grid, lead times 1–44 days, ensemble size 50.
- OmniCast outperforms ML baselines beyond day 10, matches ECMWF-ENS after day 10, and maintains near-zero bias.
- Demonstrates superior spectral consistency compared to GraphCast and PanguWeather.
- Excels in CRPS and spread/skill ratio metrics; performance surpasses ECMWF-ENS after day 15.
Medium-Range Forecasting:
- WeatherBench2 at 7 (8), 12 h intervals, two-step model prediction, autoregressive rollout to 15 days.
- Results show competitive RMSE and CRPS, only slightly behind Gencast, and performance on par with IFS-ENS.
Efficiency and Long-Term Stability:
- Training: 4 days on 32 A100 GPUs (OmniCast) vs. 5 days on 32 TPUv5e (Gencast).
- Inference: 9—29 seconds (OmniCast, A100) vs. 480 seconds (Gencast, TPUv5e); 10–20x speedup due to latent space and efficient diffusion head.
- 100-year rollouts demonstrate climatological stability across multiple variables (T2m, U10, V10, MSLP, Z500, Q700) without drift.
Ablation Results:
- Best short-lead skill with deterministic loss applied only to first 10 frames.
- Optimal S2S performance when training on full 44-day sequences.
- Fully random space–time unmasking schedule outperforms alternative orders.
- Diffusion temperature 0 yields optimal prediction diversity (SSR) and forecast accuracy.
5. Strengths, Limitations, and Open Directions
Strengths:
- Unified probabilistic forecasting across time scales in a single model.
- Masked latent diffusion and low-dim VAE mitigate error propagation and accelerate inference.
- Demonstrates state-of-the-art S2S skill (deterministic, physical, probabilistic) and competitive medium-range performance.
- Orders-of-magnitude faster inference and efficient training.
- Stable, century-scale forecast rollouts with realistic climatology.
Limitations and Future Work:
- Forecast skill is upper-bounded by VAE reconstruction quality, necessitating careful tradeoff between compression and meteorological fidelity.
- Hyperparameter sensitivity persists (mask schedules, diffusion temperature 1, unmasking schedule/iterations).
- Prospective enhancements include improved VAE objectives (stronger ELBOs, spatio-temporal latent structure), use of flow-diffusion hybrids, and extending the architecture to higher spatial resolutions.
- Further work is required to couple the model with land/ocean boundary forcings for extended seasonal prediction.
6. Comparative Summary
| Feature/Metric | OmniCast | Gencast | ECMWF-ENS | GraphCast/PanguWeather |
|---|---|---|---|---|
| S2S Deterministic | SOTA beyond day 10 | Inferior | Matched | Inferior |
| Physics Consistency | SOTA on spectral | – | SOTA | Inferior |
| Medium-Range RMSE | Competitive | Slightly better | Competitive | – |
| Inference Speed | 10–20× faster | – | – | – |
| 100-Year Stability | Yes | – | – | – |
OmniCast represents a scalable approach for unified, skillful probabilistic weather forecasting, leveraging masked latent diffusion modeling to overcome key challenges in long-range prediction (Nguyen et al., 20 Oct 2025).