OmniCast: Unified Weather Forecasting

Updated 3 July 2026

OmniCast is a probabilistic deep learning model that unifies medium-range and subseasonal-to-seasonal weather forecasting using a low-dimensional VAE and masked latent diffusion within a transformer.
It overcomes traditional autoregressive limitations with an iterative unmasking algorithm, reducing error accumulation and achieving 10–20× faster inference.
Empirical results show state-of-the-art forecast skill, superior probabilistic metrics, and century-scale climatological stability.

OmniCast is a probabilistic deep learning model designed for unified weather forecasting across time scales, spanning both medium-range and subseasonal-to-seasonal (S2S) horizons. It addresses limitations of traditional autoregressive machine learning weather models, notably compounding error and scalability, by operating in a low-dimensional latent space and employing a masked latent diffusion scheme within a transformer-based architecture. OmniCast demonstrates competitive or state-of-the-art performance on accuracy, physics-based, and probabilistic metrics, while providing substantially faster inference and stable long-term forecasts (Nguyen et al., 20 Oct 2025).

1. Core Architecture and Mathematical Formulation

OmniCast comprises two principal components: a variational autoencoder (VAE) and a diffusion-based transformer.

VAE Encoder-Decoder:

The VAE learns a per-frame mapping from raw weather fields $X \in \mathbb{R}^{V \times H \times W}$ —where $V$ is the variable dimension—to a latent map $z \in \mathbb{R}^{D \times h \times w}$ with dimensionality reduction ( $h = H/16$ , $w = W/16$ ). The dimension $D$ is typically set to 1024 (S2S) or 256 (medium-range). The VAE optimization minimizes the evidence lower bound (ELBO): $\mathcal{L}_{\mathrm{ELBO}}(\phi) = \mathbb{E}_{q_{\phi}(z|X)}\Bigl[-\log p_{\phi}(X|z)\Bigr] + \mathrm{KL}\bigl(q_{\phi}(z| X)\,\|\,p(z)\bigr)$ A convolutional UNet serves as the backbone, and optimization employs Adam with specified hyperparameters.

Diffusion-Based Transformer:

The transformer operates on sequences of future latent tokens, indexed as $i = 1, \ldots, N$ with $N = T \times h \times w$ , where $T$ is the forecast horizon. The initial VAE latent embedding of the first frame serves as the conditioning context $V$ 0. A bidirectional encoder-decoder transformer (16 layers, 16 heads, 1024 hidden units, dropout 0.1) produces per-token contextual vectors $V$ 1.

Each token is further modeled by a diffusion head, where the forward process is given by: $V$ 2 with $V$ 3 defining a linear noise schedule ( $V$ 4 for training, $V$ 5 for inference). The diffusion loss is: $V$ 6 Sampling during inference proceeds via reverse diffusion, controlled by a temperature parameter $V$ 7.

2. Training Strategy and Loss Functions

Training involves random corruption of future latent tokens and a combined generative-deterministic objective.

Random Masking and Joint Diffusion:

A mask ratio $V$ 8 determines the Bernoulli mask $V$ 9, corrupting the future sequence: $z \in \mathbb{R}^{D \times h \times w}$ 0 The generative objective is: $z \in \mathbb{R}^{D \times h \times w}$ 1

Auxiliary Deterministic Loss:

An MSE head is defined for the first ten future frames, with a weight decaying exponentially: $z \in \mathbb{R}^{D \times h \times w}$ 2 where $z \in \mathbb{R}^{D \times h \times w}$ 3 for frame index $z \in \mathbb{R}^{D \times h \times w}$ 4, zero otherwise.

Global Objective:

$z \in \mathbb{R}^{D \times h \times w}$ 5

Training uses AdamW with specified learning rate, betas, weight decay, batch size 32, 100 epochs, with 10-epoch warmup and cosine decay.

3. Inference Algorithm and Error Mitigation

OmniCast avoids standard autoregressive prediction by using masked latent diffusion joint sampling. At inference, all spatial locations and time frames are generated through an iterative space–time unmasking algorithm.

Iterative Unmasking (pseudocode):

$h = H/16$ 2 Joint prediction prevents monotonic error accumulation typical of autoregressive approaches. Randomized unmasking schedules confer diversity and allow the model to use both future and past context per token, mitigating temporal drift.

4. Empirical Performance and Analysis

Subseasonal-to-Seasonal (S2S) Forecasting:

Dataset: ERA5, 69 variables, $z \in \mathbb{R}^{D \times h \times w}$ 6 grid, lead times 1–44 days, ensemble size 50.
OmniCast outperforms ML baselines beyond day 10, matches ECMWF-ENS after day 10, and maintains near-zero bias.
Demonstrates superior spectral consistency compared to GraphCast and PanguWeather.
Excels in CRPS and spread/skill ratio metrics; performance surpasses ECMWF-ENS after day 15.

Medium-Range Forecasting:

WeatherBench2 at $z \in \mathbb{R}^{D \times h \times w}$ 7 ( $z \in \mathbb{R}^{D \times h \times w}$ 8), 12 h intervals, two-step model prediction, autoregressive rollout to 15 days.
Results show competitive RMSE and CRPS, only slightly behind Gencast, and performance on par with IFS-ENS.

Efficiency and Long-Term Stability:

Training: 4 days on 32 A100 GPUs (OmniCast) vs. 5 days on 32 TPUv5e (Gencast).
Inference: $z \in \mathbb{R}^{D \times h \times w}$ 9—29 seconds (OmniCast, A100) vs. 480 seconds (Gencast, TPUv5e); 10–20x speedup due to latent space and efficient diffusion head.
100-year rollouts demonstrate climatological stability across multiple variables (T2m, U10, V10, MSLP, Z500, Q700) without drift.

Ablation Results:

Best short-lead skill with deterministic loss applied only to first 10 frames.
Optimal S2S performance when training on full 44-day sequences.
Fully random space–time unmasking schedule outperforms alternative orders.
Diffusion temperature $h = H/16$ 0 yields optimal prediction diversity (SSR) and forecast accuracy.

5. Strengths, Limitations, and Open Directions

Strengths:

Unified probabilistic forecasting across time scales in a single model.
Masked latent diffusion and low-dim VAE mitigate error propagation and accelerate inference.
Demonstrates state-of-the-art S2S skill (deterministic, physical, probabilistic) and competitive medium-range performance.
Orders-of-magnitude faster inference and efficient training.
Stable, century-scale forecast rollouts with realistic climatology.

Limitations and Future Work:

Forecast skill is upper-bounded by VAE reconstruction quality, necessitating careful tradeoff between compression and meteorological fidelity.
Hyperparameter sensitivity persists (mask schedules, diffusion temperature $h = H/16$ 1, unmasking schedule/iterations).
Prospective enhancements include improved VAE objectives (stronger ELBOs, spatio-temporal latent structure), use of flow-diffusion hybrids, and extending the architecture to higher spatial resolutions.
Further work is required to couple the model with land/ocean boundary forcings for extended seasonal prediction.

6. Comparative Summary

Feature/Metric	OmniCast	Gencast	ECMWF-ENS	GraphCast/PanguWeather
S2S Deterministic	SOTA beyond day 10	Inferior	Matched	Inferior
Physics Consistency	SOTA on spectral	–	SOTA	Inferior
Medium-Range RMSE	Competitive	Slightly better	Competitive	–
Inference Speed	10–20× faster	–	–	–
100-Year Stability	Yes	–	–	–

OmniCast represents a scalable approach for unified, skillful probabilistic weather forecasting, leveraging masked latent diffusion modeling to overcome key challenges in long-range prediction (Nguyen et al., 20 Oct 2025).

Markdown Report Issue Upgrade to Chat

References (1)

OmniCast: A Masked Latent Diffusion Model for Weather Forecasting Across Time Scales (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to OmniCast.