Papers
Topics
Authors
Recent
Search
2000 character limit reached

OmniCast: Unified Weather Forecasting

Updated 3 July 2026
  • OmniCast is a probabilistic deep learning model that unifies medium-range and subseasonal-to-seasonal weather forecasting using a low-dimensional VAE and masked latent diffusion within a transformer.
  • It overcomes traditional autoregressive limitations with an iterative unmasking algorithm, reducing error accumulation and achieving 10–20× faster inference.
  • Empirical results show state-of-the-art forecast skill, superior probabilistic metrics, and century-scale climatological stability.

OmniCast is a probabilistic deep learning model designed for unified weather forecasting across time scales, spanning both medium-range and subseasonal-to-seasonal (S2S) horizons. It addresses limitations of traditional autoregressive machine learning weather models, notably compounding error and scalability, by operating in a low-dimensional latent space and employing a masked latent diffusion scheme within a transformer-based architecture. OmniCast demonstrates competitive or state-of-the-art performance on accuracy, physics-based, and probabilistic metrics, while providing substantially faster inference and stable long-term forecasts (Nguyen et al., 20 Oct 2025).

1. Core Architecture and Mathematical Formulation

OmniCast comprises two principal components: a variational autoencoder (VAE) and a diffusion-based transformer.

VAE Encoder-Decoder:

The VAE learns a per-frame mapping from raw weather fields XRV×H×WX \in \mathbb{R}^{V \times H \times W}—where VV is the variable dimension—to a latent map zRD×h×wz \in \mathbb{R}^{D \times h \times w} with dimensionality reduction (h=H/16h = H/16, w=W/16w = W/16). The dimension DD is typically set to 1024 (S2S) or 256 (medium-range). The VAE optimization minimizes the evidence lower bound (ELBO): LELBO(ϕ)=Eqϕ(zX)[logpϕ(Xz)]+KL(qϕ(zX)p(z))\mathcal{L}_{\mathrm{ELBO}}(\phi) = \mathbb{E}_{q_{\phi}(z|X)}\Bigl[-\log p_{\phi}(X|z)\Bigr] + \mathrm{KL}\bigl(q_{\phi}(z| X)\,\|\,p(z)\bigr) A convolutional UNet serves as the backbone, and optimization employs Adam with specified hyperparameters.

Diffusion-Based Transformer:

The transformer operates on sequences of future latent tokens, indexed as i=1,,Ni = 1, \ldots, N with N=T×h×wN = T \times h \times w, where TT is the forecast horizon. The initial VAE latent embedding of the first frame serves as the conditioning context VV0. A bidirectional encoder-decoder transformer (16 layers, 16 heads, 1024 hidden units, dropout 0.1) produces per-token contextual vectors VV1.

Each token is further modeled by a diffusion head, where the forward process is given by: VV2 with VV3 defining a linear noise schedule (VV4 for training, VV5 for inference). The diffusion loss is: VV6 Sampling during inference proceeds via reverse diffusion, controlled by a temperature parameter VV7.

2. Training Strategy and Loss Functions

Training involves random corruption of future latent tokens and a combined generative-deterministic objective.

Random Masking and Joint Diffusion:

A mask ratio VV8 determines the Bernoulli mask VV9, corrupting the future sequence: zRD×h×wz \in \mathbb{R}^{D \times h \times w}0 The generative objective is: zRD×h×wz \in \mathbb{R}^{D \times h \times w}1

Auxiliary Deterministic Loss:

An MSE head is defined for the first ten future frames, with a weight decaying exponentially: zRD×h×wz \in \mathbb{R}^{D \times h \times w}2 where zRD×h×wz \in \mathbb{R}^{D \times h \times w}3 for frame index zRD×h×wz \in \mathbb{R}^{D \times h \times w}4, zero otherwise.

Global Objective:

zRD×h×wz \in \mathbb{R}^{D \times h \times w}5

Training uses AdamW with specified learning rate, betas, weight decay, batch size 32, 100 epochs, with 10-epoch warmup and cosine decay.

3. Inference Algorithm and Error Mitigation

OmniCast avoids standard autoregressive prediction by using masked latent diffusion joint sampling. At inference, all spatial locations and time frames are generated through an iterative space–time unmasking algorithm.

Iterative Unmasking (pseudocode):

h=H/16h = H/162 Joint prediction prevents monotonic error accumulation typical of autoregressive approaches. Randomized unmasking schedules confer diversity and allow the model to use both future and past context per token, mitigating temporal drift.

4. Empirical Performance and Analysis

Subseasonal-to-Seasonal (S2S) Forecasting:

  • Dataset: ERA5, 69 variables, zRD×h×wz \in \mathbb{R}^{D \times h \times w}6 grid, lead times 1–44 days, ensemble size 50.
  • OmniCast outperforms ML baselines beyond day 10, matches ECMWF-ENS after day 10, and maintains near-zero bias.
  • Demonstrates superior spectral consistency compared to GraphCast and PanguWeather.
  • Excels in CRPS and spread/skill ratio metrics; performance surpasses ECMWF-ENS after day 15.

Medium-Range Forecasting:

  • WeatherBench2 at zRD×h×wz \in \mathbb{R}^{D \times h \times w}7 (zRD×h×wz \in \mathbb{R}^{D \times h \times w}8), 12 h intervals, two-step model prediction, autoregressive rollout to 15 days.
  • Results show competitive RMSE and CRPS, only slightly behind Gencast, and performance on par with IFS-ENS.

Efficiency and Long-Term Stability:

  • Training: 4 days on 32 A100 GPUs (OmniCast) vs. 5 days on 32 TPUv5e (Gencast).
  • Inference: zRD×h×wz \in \mathbb{R}^{D \times h \times w}9—29 seconds (OmniCast, A100) vs. 480 seconds (Gencast, TPUv5e); 10–20x speedup due to latent space and efficient diffusion head.
  • 100-year rollouts demonstrate climatological stability across multiple variables (T2m, U10, V10, MSLP, Z500, Q700) without drift.

Ablation Results:

  • Best short-lead skill with deterministic loss applied only to first 10 frames.
  • Optimal S2S performance when training on full 44-day sequences.
  • Fully random space–time unmasking schedule outperforms alternative orders.
  • Diffusion temperature h=H/16h = H/160 yields optimal prediction diversity (SSR) and forecast accuracy.

5. Strengths, Limitations, and Open Directions

Strengths:

  • Unified probabilistic forecasting across time scales in a single model.
  • Masked latent diffusion and low-dim VAE mitigate error propagation and accelerate inference.
  • Demonstrates state-of-the-art S2S skill (deterministic, physical, probabilistic) and competitive medium-range performance.
  • Orders-of-magnitude faster inference and efficient training.
  • Stable, century-scale forecast rollouts with realistic climatology.

Limitations and Future Work:

  • Forecast skill is upper-bounded by VAE reconstruction quality, necessitating careful tradeoff between compression and meteorological fidelity.
  • Hyperparameter sensitivity persists (mask schedules, diffusion temperature h=H/16h = H/161, unmasking schedule/iterations).
  • Prospective enhancements include improved VAE objectives (stronger ELBOs, spatio-temporal latent structure), use of flow-diffusion hybrids, and extending the architecture to higher spatial resolutions.
  • Further work is required to couple the model with land/ocean boundary forcings for extended seasonal prediction.

6. Comparative Summary

Feature/Metric OmniCast Gencast ECMWF-ENS GraphCast/PanguWeather
S2S Deterministic SOTA beyond day 10 Inferior Matched Inferior
Physics Consistency SOTA on spectral SOTA Inferior
Medium-Range RMSE Competitive Slightly better Competitive
Inference Speed 10–20× faster
100-Year Stability Yes

OmniCast represents a scalable approach for unified, skillful probabilistic weather forecasting, leveraging masked latent diffusion modeling to overcome key challenges in long-range prediction (Nguyen et al., 20 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to OmniCast.