Weather4Cast 2025 Competition

Updated 21 November 2025

Weather4Cast 2025 Competition is an international benchmarking initiative that advances short-term precipitation nowcasting using deep learning techniques with satellite and radar data.
Participants predict cumulative rainfall and extreme events over specific European regions using historical satellite IR imagery and ground radar-derived precipitation rates.
Evaluation metrics such as CRPS, RMSE, SSIM, and F1-score drive innovations in model architecture, transfer learning, and computational efficiency.

The Weather4Cast 2025 competition is an international benchmarking initiative focused on advancing short-term precipitation nowcasting using deep learning methods trained on geostationary satellite and ground-based radar data. Participants are tasked with predicting cumulative rainfall or rainfall events with high spatial and temporal resolution over specific European regions, using historical satellite radiance frames as input. The competition emphasizes model accuracy, computational efficiency, transferability across regions and years, and the use of robust, reproducible ML pipelines. Evaluation is primarily based on probabilistic skill scores, such as the Continuous Ranked Probability Score (CRPS), as well as standard detection metrics for event-based tasks. The competition serves as a testbed for recent developments in deep learning, recurrent architectures, transformers, and hybrid approaches in weather forecasting.

1. Competition Structure and Data Regime

Weather4Cast 2025 defines tasks centered on regional nowcasting of rainfall utilizing standardized satellite and radar datasets. Participants are provided with four 15-minute consecutive satellite frames (one hour of history) for each sample and are required to forecast the next four hours (16 future 15-minute intervals) of rainfall. The primary satellite modality is the SEVIRI IR 10.8 µm channel, though some methods incorporate the full 11-channel HRIT set. Ground-truth comes from OPERA-compliant precipitation rates derived from ground radar networks. Competition tracks include cumulative rainfall regression (central region, 32 × 32 pixels) and extreme-event detection. Seven region-specific datasets, with varying spatial/temporal coverage, are provided for both core (training and validation) and transfer (test-only) regions. Input samples are spatially cropped and normalized per-channel; typically, four spatially and temporally contextual crops per sample are used to enforce transferability and robustness (Bhuskute et al., 14 Nov 2025, Harris et al., 14 Nov 2025).

Forecasting skill is assessed via CRPS for cumulative rainfall (probabilistic regression) and a combination of pixel-wise RMSE, SSIM, F1, and detection rates for event identification. The event task requires 3D spatial–temporal connected component analysis to identify contiguous precipitation clusters exceeding threshold rates.

2. Model Architectures and Methodological Innovations

Weather4Cast 2025 submissions display a wide range of architectural motifs, with leading approaches spanning:

ConvGRU Encoder–Decoder Pipelines: The 2nd-place cumulative rainfall solution utilizes a lightweight encoder–decoder with two stacked ConvGRU layers, leveraging recurrence to capture temporal dependencies in IR-brightness temperature sequences. Each lead time (1–4 hours) is handled by an independent network, mitigating autoregressive error accumulation and permitting fully parallel inference. Encoder layers use 3×3 convolutions with ReLU and batch normalization. The ConvGRU operates on 32-channel feature maps, and the decoder successively reduces to the output frame through residual blocks (Bhuskute et al., 14 Nov 2025).
Transformer-Based Video Models: SaTformer, the 1st-place model, is a pure video transformer operating on 4 × 32 × 32 × 11 input tensors, using patch embedding (4×4 patches), a learnable class token, and full space–time self-attention across 12 layers. This allows global cross-frame interaction and flexible dependency modeling between spatial and temporal tokens, which is crucial for capturing localized convective events as well as broad synoptic systems (Harris et al., 14 Nov 2025).
U-Net Variants with Probabilistic Bottlenecks: Probabilistic models, e.g., Variational U-Net, integrate a VAE-style bottleneck (latent 512-D code) into a U-Net backbone. The encoder comprises multiple Dense Blocks (Conv2D + ELU + GroupNorm + Dropout) and max-pooling, and the decoder uses transposed convolutions for spatial upsampling. Skip connections preserve fine detail. The latent prior $q_{\phi}(z|x)$ and reconstruction are jointly trained to minimize an ELBO incorporating weighted KL divergence (Kwok et al., 2021).
Region-Conditioned 3D U-Nets: Region-specific modulation is achieved via auxiliary MLPs that output channel-wise scaling (γ) and bias (β), which are applied to high-level encoder activations before decoding. Orthogonal regularization is imposed on 1×1×1 convolution shortcuts to stabilize training and improve generalization, and FiLM adapters provide domain adaptation for each region/year pair (Kim et al., 2022).
Task-Specific Losses and Augmentations: Multi-level Dice loss leverages ordinal rainfall bins to capture the naturally ordered structure of precipitation intensity, outperforming standard categorical dice or cross-entropy. Temporal Frame Interpolation (TFI), a novel augmentation, stochastically blends adjacent input frames and corresponding target frames, generating synthetic temporally-interpolated samples to enhance temporal robustness (Han et al., 2023).

3. Two-Stage and Hybrid Training Strategies

Cumulative rainfall solutions exhibit a common multi-stage strategy:

Stage 1: Satellite Brightness-Temperature Forecasting: Networks are first trained (using pixel-wise MSE loss) to predict normalized brightness temperatures at future frames. This exploits the high spatiotemporal correlation between cloud-top IR radiances and subsequent precipitation occurrence (Bhuskute et al., 14 Nov 2025).
Stage 2: Nonlinear IR-to-Rainfall Transformation: The predicted brightness temperature fields are empirically mapped to OPERA rainfall rates via a nonlinear power-law:

$R(x, y, t) = \alpha \max(0, 300 - T(x, y, t))^\beta,$

where $T$ is in Kelvin and α, β are calibrated to ground-radar truth.

Similar hybrid strategies are found in transformer-based systems, where regression over continuous rainfall is reformulated into classification via binning (typically $n=64$ bins, optimized for tradeoff between resolution and sample sparsity). Class-weighted cross-entropy loss is employed to mitigate the heavy imbalance of dry vs. extreme rainfall events (Harris et al., 14 Nov 2025).

Transfer learning schedules are implemented via initial pretraining on large, high-coverage regions, followed by fine-tuning on target domains with reduced learning rates and frozen feature extractors. FiLM parameters or region-specific adapters are tuned exclusively, minimizing overfitting to scarce test regions (Kim et al., 2022).

4. Evaluation Metrics and Competition Results

Summary of quantitative results from Weather4Cast 2025 final leaderboard:

Model	CRPS (Cumulative)	RMSE (mm)	SSIM	[email protected]	Inference Time	Rank
SaTformer	3.135	—	—	—	—	1st
ConvGRU Pipeline	3.37	2.48	0.747	0.682	<150 ms/seq	2nd
ConvLSTM Baseline	—	19.01	0.754	—	—	—
Persistence	—	23.58	0.650	—	—	—

Key findings include:

ConvGRU pipelines achieved a 4-hour average RMSE of 19.28 on raw brightness temperature (K), outperforming a persistence baseline by ~4.3 units and ConvLSTM by 0.27 units on average.
After IR-to-rainfall conversion, the ConvGRU model reached an F1-score of 0.682 at 0.5 mm threshold and SSIM of 0.747, indicating both quantitative performance and preservation of structural rainfall patterns (Bhuskute et al., 14 Nov 2025).
SaTformer’s classification strategy resulted in robust handling of heavy rainfall extremes, with ablation studies demonstrating a collapse to the no-rain class if class weighting was omitted.

Event detection tracks required 3D connected-component labeling on rain-rate volumes, computing features (maximum intensity, duration, footprint, centroid) and exporting the top five events per sequence. ConvGRU models matched baseline event detection performance (Bhuskute et al., 14 Nov 2025).

5. Computational Efficiency and Implementation Considerations

High computational efficiency is a salient requirement for operational nowcasting. ConvGRU-based encoder–decoders achieve significant parameter savings (40% fewer than ConvLSTM baselines) by using a single IR channel and two modestly sized temporal layers (32 channels each). Independent lead-time models permit fully parallel inference, with end-to-end execution times in the sub-150 ms regime per sequence on contemporary GPU hardware (Bhuskute et al., 14 Nov 2025).

Transformer architectures, such as SaTformer, remain tractable due to the small number of spatiotemporal tokens (e.g., 4 frames × 8×8 patch grids + 1 class token), enabling full 3D self-attention without windowing or strategic factorization. These models require large batch sizes for stable optimization, with training performed at batch sizes up to 128 across multiple GPUs, and no mixed-precision or gradient clipping reported (Harris et al., 14 Nov 2025).

Real-time feasibility is further enhanced in U-Net and SmaAt-U-Net families via depthwise-separable convolutions, shallow attention modules (CBAM), and aggressive parameter pruning, reducing both inference cost and energy footprint (Punjabi et al., 2023, Punjabi et al., 2021).

6. Lessons Learned and Implications for Future Competitions

Best practices and design principles distilled from Weather4Cast 2025 include:

Reformulating regression targets (rainfall accumulations) as classification over adaptively chosen bins stabilizes training and leverages mature transformer video architectures while capturing long-tailed precipitation distributions effectively (Harris et al., 14 Nov 2025).
Empirically calibrated IR-to-rain mapping provides competitive rainfall rate estimation, but further improvements are likely with additional channels and learned parameterizations.
Transfer learning, region-specific adapters, and domain-informed augmentations (e.g., mixup, temporal jitter, TFI) enhance generalization to previously unseen regions and temporal shifts (Kim et al., 2022, Han et al., 2023).
Fast convergence (10–12 epochs for ConvGRU, 200 for transformers) and low-latency inference are achieved by architectural parsimony and parallel model scheduling.
Incorporating explicit region and year conditioning via FiLM adapters, orthogonality regularization, and self-distillation improves robustness under domain shift and facilitates rapid adaptation to operational regime changes (Kim et al., 2022).
Persistent limitations include underestimation of the heaviest precipitation tails and a reliance on empirical precipitation mapping, indicating a need for more physics-aware losses and uncertainty-calibrated evaluations in future iterations.

Future contests will likely amplify the focus on uncertainty quantification, multi-region adaptation, advanced physics-informed losses, and ensembling strategies, as established in the 2025 cohort (Kwok et al., 2021, Kim et al., 2022).

7. Bibliographic Reference Table

Model/Approach	Paper Title	arXiv ID
ConvGRU Pipeline	Computationally-efficient deep learning models for nowcasting of precipitation...	(Bhuskute et al., 14 Nov 2025)
SaTformer (Transformer)	A Space-Time Transformer for Precipitation Forecasting	(Harris et al., 14 Nov 2025)
Variational U-Net	A Variational U-Net for Weather Forecasting	(Kwok et al., 2021)
Region-Conditioned 3D U-Net	Region-Conditioned Orthogonal 3D U-Net for Weather4Cast Competition	(Kim et al., 2022)
TFI/ML-Dice/UNet	Learning Robust Precipitation Forecaster by Temporal Frame Interpolation	(Han et al., 2023)
SmaAt-U-Net Baseline	Efficient spatio-temporal weather forecasting using U-Net	(Punjabi et al., 2021)
Efficient Baseline	Efficient Baseline for Quantitative Precipitation Forecasting in Weather4cast 2023	(Punjabi et al., 2023)

This compendium reflects a mature, data-driven landscape in mesoscale nowcasting, combining domain-specific knowledge, recurrent and attention-based methods, and computationally efficient implementations to approach operational readiness (Bhuskute et al., 14 Nov 2025, Harris et al., 14 Nov 2025, Kim et al., 2022, Han et al., 2023).