HRRRCast: Data-Driven Weather Emulator

Updated 2 July 2026

HRRRCast is a data-driven emulator that leverages convolutional, graph-based, and diffusion models to simulate high-resolution (6 km) weather forecasts over CONUS.
It incorporates multi-lead-time training and conditioning on future GFS states to extend forecast accuracy beyond traditional physics-based models.
The approach combines ResHRRR and GraphHRRR architectures with DDIM sampling, resulting in sharper, calibrated composite reflectivity forecasts and efficient ensemble generation.

HRRRCast is a data-driven emulator of the High-Resolution Rapid Refresh (HRRR) model, targeting regional, convection-allowing weather forecasting over the contiguous United States (CONUS) at 6 km resolution. Designed to provide a computationally efficient alternative to physics-based numerical weather prediction (NWP) via advanced machine learning, HRRRCast incorporates both convolutional and graph-based architectures and leverages a diffusion model for probabilistic ensemble forecasting. Notable extensions relative to prior work include full-CONUS spatial coverage, conditioning on future global model (GFS) states, multi-lead-time training, and the use of post-processed HRRR analysis fields as training targets (Abdi et al., 8 Jul 2025).

1. Problem Setting and Conceptual Advancements

HRRRCast is motivated by the computational expense of conventional convection-allowing models such as HRRR, a physics-based, 3 km regional NWP system used in operational weather forecasting. HRRRCast emulates HRRR at 6 km, building upon the StormCast diffusion model framework [Pathak et al. 2024], but with four principal innovations:

Full CONUS coverage at 6 km resolution: HRRRCast extends the spatial domain from a subdomain (used in StormCast) to the entire U.S., increasing the realism and generalizability of forecasts.
Multi-lead-time training and greedy rollout: The model learns to forecast at lead times of 1 h, 3 h, and 6 h, and applies a greedy rollout strategy at inference time to reach longer lead times, enabling robust long-range forecasting.
Use of true HRRR analysis fields as targets: In contrast to StormCast, which used a post-analysis offset, HRRRCast is trained directly on ground-truth analysis, improving target fidelity.
Conditioning on future GFS states: Incorporation of global model output at both current and future times ( $t+L$ ) transforms the emulator into a hybrid forecast-plus-downscaler, enhancing long-lead forecasting accuracy.

These modifications yield sharper, better-calibrated composite reflectivity forecasts, maintain fine spatial detail, and support fast ensemble generation via diffusion-based sampling (Abdi et al., 8 Jul 2025).

2. Model Architectures

2.1 ResHRRR (ResNet-based Model)

ResHRRR is a U-Net–style residual network comprising 30 Squeeze-and-Excitation (SE) ResNet blocks (23.5M parameters). Its input tensor combines HRRR analysis and GFS data at times $t$ and $t+L$ ( $x_0\in\mathbb{R}^{530\times900\times180}$ ). The architecture consists of:

Encoder: Two down-sampling SE-ResNet stages reduce dimensionality to $130\times225\times192$ .
Processor: 26 SE-ResNet blocks at fixed spatial resolution.
Decoder: Two up-sampling SE-ResNet stages reconstruct to $530\times900\times74$ .

Key architectural elements include:

SE blocks: Provide per-block channel-wise attention via learned gating vectors, dynamically re-weighting feature maps.
FiLM conditioning: Embeds lead time $L$ and diffusion step $\tau$ into scale–shift parameters $(\gamma, \beta)$ , modulating features according to

$\mathrm{FiLM}(h) = \gamma(L,\tau)\odot h + \beta(L,\tau)$

Trainable skip connections: Learned scalars $t$ 0 and $t$ 1 weight the clean and noised inputs during reverse diffusion, stabilizing training.
DDIM-based probabilistic sampling: Supports ensemble forecast generation (see Section 3).

2.2 GraphHRRR (Graph Neural Network–based Model)

The GraphHRRR architecture adapts the hierarchical GNN (as in GraphCast [Lam et al. 2023]) to a Delaunay mesh representing the LCC grid. Grid cells are nodes, with message passing across both fine and coarse hierarchy levels. The system includes Dirichlet boundary conditions for global model state ingestion. The current instantiation (37M parameters) is deterministic, operates only at 1 h lead, and does not ingest synoptic-scale inputs; as a result, performance rapidly degrades for multi-lead rollouts, especially due to the lack of diffusion and multi-lead training (Abdi et al., 8 Jul 2025).

3. Diffusion Model for Probabilistic Forecasting

The core of HRRRCast’s ensemble capability is a denoising diffusion implicit model (DDIM). The forward process applies incremental Gaussian noise ( $t$ 2 steps) to the input,

$t$ 3

with

$t$ 4

using a cosine schedule:

$t$ 5

Reverse sampling implements DDIM, with deterministic ( $t$ 6) or stochastic ( $t$ 7) steps,

$t$ 8

$t$ 9

Inference uses 30–50 steps. The mean squared denoising loss is

$t+L$ 0

4. Training Procedure, Data, and Rollout Strategies

Input/Output and Loss Functions

Inputs: HRRR analysis at $t+L$ 1 for six 3D variables (TMP, HGT, UGRD, VGRD, WGRD, SPFH) across 12 pressure levels, reflectivity, 2 m temperature, pressures, orography, and land mask; GFS at $t+L$ 2 and $t+L$ 3 with the same six variables at four levels, surface/SLP, and reflectivity.
Targets: HRRR analysis at $t+L$ 4.
Losses:
- Diffusion denoising loss (see above)
- Weighted MSE on final outputs, with pressure and variable importance reflected in per-variable $t+L$ 5 (heavier weights near surface):
$t+L$ 6
Optimization: AdamW, batch size $t+L$ 7 2–4 (GPU-bound), learning rate $t+L$ 8, weight decay $t+L$ 9.

Multi-Lead-Time Training and Greedy Rollout

Each batch during training samples $x_0\in\mathbb{R}^{530\times900\times180}$ 0 uniformly; FiLM conditioning encodes $x_0\in\mathbb{R}^{530\times900\times180}$ 1 and $x_0\in\mathbb{R}^{530\times900\times180}$ 2. At inference, the model employs a greedy rollout strategy to forecast $x_0\in\mathbb{R}^{530\times900\times180}$ 3 hours, iteratively choosing the largest allowable lead and advancing: $530\times900\times74$ 0

5. Evaluation Metrics and Benchmarking

Several classes of metrics assess performance, with composite reflectivity evaluated over the full CONUS:

Contingency-based (grid):
- Probability of Detection (POD): $x_0\in\mathbb{R}^{530\times900\times180}$ 4
- Success Ratio (SR): $x_0\in\mathbb{R}^{530\times900\times180}$ 5
- Critical Success Index (CSI): $x_0\in\mathbb{R}^{530\times900\times180}$ 6
- Frequency Bias (FB): $x_0\in\mathbb{R}^{530\times900\times180}$ 7
Neighborhood-based (FSS): At pooling window $x_0\in\mathbb{R}^{530\times900\times180}$ 8, fractional skill score:

$x_0\in\mathbb{R}^{530\times900\times180}$ 9

with $130\times225\times192$ 0, $130\times225\times192$ 1 the forecast, observation fraction over a window.

Object-based (MODE): Storm object detection with centroid error, area, orientation, and contingency statistics on matched objects.
Spectral (power spectrum): For field $130\times225\times192$ 2,

$130\times225\times192$ 3

Thresholds $130\times225\times192$ 4 dBZ are standard.

6. Empirical Findings

Key results for composite reflectivity forecasting:

At $130\times225\times192$ 5 dBZ, ResHRRR’s FSS exceeds HRRR for all lead times up to 48 h; even 3 ensemble members outperform HRRR’s deterministic forecasts, and skill gains from ensembling saturate beyond 5 members.
At $130\times225\times192$ 6 dBZ, ResHRRR outperforms HRRR up to $130\times225\times192$ 77 h and remains competitive with 10-member ensembles beyond.
Mean RMSE for REFC (reflectivity) is lower for ResHRRR than HRRR after 6 h, while RMSE for 2 m temperature is higher—an artifact of lower loss weighting for this variable.
Power spectra analysis: diffusion-based ResHRRR preserves high-wavenumber energy and spatial sharpness better than deterministic models.
Spread–error ratio is $130\times225\times192$ 8 (under-dispersive ensembles), suggesting that improved stochastic initial condition perturbations are required.
Full-CONUS and multi-lead training mitigate error accumulation for long-lead forecasts.

A summary of major architecture and benchmarking contrasts is provided below:

Feature	HRRRCast (ResHRRR)	HRRR (baseline)
Core engine	ML diffusion model	Physics-based
Lead times	1, 3, 6 h; rollout	All
Domain	Full CONUS, 6 km	Full CONUS, 3 km
Probabilistic/ensemble	Yes (DDIM)	Yes (IC pert.)
Ensemble skill (FSS)	≥HRRR (light th.)	Baseline

7. Limitations and Future Directions

Current GraphHRRR limitations: Lacks diffusion, GFS conditioning, and is trained at only 1 h lead time; as a result, performance degrades rapidly with lead time extension.
Ensemble under-dispersion: Planned remedies include introducing stochastic sampling ( $130\times225\times192$ 9 in DDIM) and perturbing GFS/initial inputs with GEFS-based approaches to better match error growth.
Future research: Integrate diffusion (DDIM or EDM) within the GNN architecture for probabilistic graph-based forecasting; hybrid CNN/GNN encoders as in [Siddiqui et al. 2024]; domain adaptation techniques for rolling diffusion (ERDM) to fine-tune long-term skill.

The methodology, benchmark analysis, and future plans position HRRRCast as a data-driven alternative for regional weather prediction, providing ensemble capability and competitive forecast quality with significantly reduced computational overhead (Abdi et al., 8 Jul 2025).

Markdown Report Issue Upgrade to Chat

References (1)

HRRRCast: a data-driven emulator for regional weather forecasting at convection allowing scales (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HRRRCast.