ArchesClimate: Deep Learning for Decadal Climate

Updated 26 September 2025

ArchesClimate is an advanced deep learning-based climate model emulator that generates decadal simulation ensembles with high physical and statistical fidelity.
It employs a deterministic hierarchical vision transformer and a generative flow matching model to predict coupled ocean–atmosphere states in an autoregressive framework.
The emulator significantly reduces computational costs by replacing resource-intensive numerical models, enabling real-time uncertainty analysis and scenario evaluation.

ArchesClimate is an advanced deep learning-based climate model emulator developed for the probabilistic generation of decadal climate simulation ensembles. Its primary objective is to make climate projection ensemble generation computationally tractable while preserving the physical and statistical fidelity of full-complexity numerical models, specifically targeting the decadal evolution of the coupled ocean–atmosphere system. The model is trained on monthly hindcast outputs from the IPSL-CM6A-LR climate model at a spatial resolution of about 2.5° × 1.25°, and utilizes a flow matching framework adapted from ArchesWeatherGen to generate long-term, autoregressive climate trajectories.

1. Objective and Theoretical Framework

ArchesClimate is constructed to emulate ensembles of climate model outputs under varying initial conditions and forcings, which are the standard for quantifying uncertainty in future climate projections. Traditional generation of such ensembles is computationally demanding due to the complexity and resource requirements of coupled Earth system models. ArchesClimate is designed to address this bottleneck by learning mappings from climate model states and external forcings to future states using deep neural network architectures. The system generates single- or multi-variable monthly climate fields that are physically consistent and statistically interchangeable with the IPSL-CM6A-LR model for periods up to at least 10 years.

The emulator uses two neural modules:

$f_\theta$ : A deterministic hierarchical vision transformer, responsible for predicting the mean next-month climate state from previous states.
$g_\theta$ : A generative flow matching model, responsible for mapping Gaussian noise realizations to physically plausible residual corrections and thus sampling from the conditional distribution of internal climate variability given the deterministic forecast and preceding states.

The full monthly state is generated via:

$X_{t+\delta} = f_\theta(X_t) + \frac{g_\theta(\text{inputs})}{\sigma}$

where $\sigma$ is a standard deviation scaling factor, and $g_\theta$ operates by transforming noise into the state-space residual.

2. Model Architecture and Training

The deterministic component $f_\theta$ is a modified hierarchical vision transformer with increased embedding dimensionality relative to previous weather emulators, and without the axial attention or skip connections found in ArchesWeatherGen. These modifications allow the network to model cross-domain correlations between oceanic and atmospheric fields while managing computational cost.

The generative component $g_\theta$ employs flow matching rather than classical denoising diffusion techniques. At each step, the residual correction ( $r_{t+\delta}$ ) is modeled as:

$r_{t+\delta} = (X_{t+\delta} - f_\theta(X_t)) / \sigma$

where $g_\theta$ learns to map noise to $r_{t+\delta}$ in a continuous ("flow matching") fashion. The training objective is to minimize the mean-squared error between the generated residual and the true error minus a noise sample:

$\mathcal{L} = \mathbb{E}_{s,\epsilon} [ \| g_\theta(\cdot) - (r_{t+\delta} - \epsilon) \|_2^2 ]$

$g_\theta$ receives as input the deterministic forecast, a history of preceding states, and a noised version of the true residual (using the flow matching schedule parameter $s$ ).

External climate forcings (CO $_2$ , CH $_4$ , CFC11eq, N $_2$ O, solar irradiance) are embedded and injected via conditional layer normalization throughout the transformer blocks, ensuring that the model is sensitive to changes in anthropogenic and natural external drivers.

Training is two-stage: first, $f_\theta$ is trained to minimize deterministic prediction error; then, with $f_\theta$ fixed, $g_\theta$ is trained to generate residuals conditioned on forecasts, state history, and noise.

3. Prediction Workflow and Autoregressive Generation

Once trained, ArchesClimate produces climate evolutions as follows:

At time $t$ , the previous climate state $X_t$ and forcings are input.
$f_\theta$ generates a mean prediction $f_\theta(X_t)$ .
$g_\theta$ transforms a sampled noise vector into the residual, which corrects $f_\theta(X_t)$ to match the true variability.
The new state $X_{t+\delta}$ is constructed and fed back as input for the next time step, allowing for autoregressive generation over arbitrary time horizons.

The generative process requires $M$ flow-matching steps, akin to a progressive noise reduction, to yield a realistic ensemble member at each time step.

4. Physical Consistency and Stability

ArchesClimate maintains multivariate physical consistency and temporal stability for lead times up to 10 years, as demonstrated by several diagnostic metrics:

The rank histograms and CRPS show that ArchesClimate ensembles reproduce the distribution and spread of IPSL-CM6A-LR outputs for climate variables.
Pearson correlation and power spectrum analysis confirm that key variables (e.g., surface temperature, precipitation, sea surface temperature, net surface heat flux, cloud cover) are spatially and temporally coherent between emulator and reference model.
Physical constraints (mass and energy conservation, causal relationships between ocean–atmosphere) are preserved implicitly by the network’s architecture and training on full spatiotemporal fields.

5. Coverage of Climate Variables and Interchangeability

The system emulates a wide variety of coupled climate fields at monthly resolution:

Variable Type	Examples	Fidelity to IPSL-CM6A-LR
Atmospheric Fields	Surface/pressure-level temperature, SLP, wind (u,v), cloud cover, precipitation, evaporation	Statistically interchangeable
Oceanic Fields	Sea surface temperature, ocean heat content, mixed-layer depth	Statistically interchangeable
Forcing Variables	CO $_2$ , CH $_4$ , CFC11eq, N $_2$ O, solar forcing	Handled via conditional normalization

For several climate metrics, ArchesClimate’s output is effectively indistinguishable from that of the reference CM (in terms of pdfs, moments, and power spectra), validating the interchangeability claim.

6. Computational Implications and Ensemble Generation

By replacing the computationally intensive forward integrations of the physical model (which require extensive HPC resources and hundreds of hours per ensemble member) with a trained deep network, ArchesClimate enables real-time (or near real-time) generation of decadal ensembles on consumer-grade GPU hardware. This acceleration facilitates large-scale uncertainty analysis, probabilistic risk assessments, and scenario evaluation for climate impacts.

7. Limitations and Future Directions

While ArchesClimate provides stable and physically consistent emulations for up to 10 years and a broad set of variables, some limitations remain:

The accuracy of out-of-sample predictions is bounded by the diversity and coverage of the training hindcasts.
Some fine-scale or highly nonlinear phenomena (e.g., rare extremes, abrupt regime shifts) may be underrepresented compared to high-resolution physics-based models.
Physical interpretability is statistical rather than mechanistic, although coupling to external forcings is explicit.

A plausible implication is that further scaling and training on multi-model, multi-scenario datasets could extend the model’s utility for probabilistic decadal forecasting in operational and research settings.

In summary, ArchesClimate (Clyne et al., 19 Sep 2025) defines a new class of deep learning-based climate emulators that integrate transformer architectures, flow matching generative models, and conditional normalization to emulate physically and statistically faithful decadal climate ensembles at minimal computational cost. Its architecture and training enable efficient, stable, and actionable climate projections for a large set of coupled variables, substantially lowering the barrier to ensemble-based climate risk analysis.

PDF Markdown Chat (Pro)

References (1)

ArchesClimate: Probabilistic Decadal Ensemble Generation With Flow Matching (2025)

Follow Topic

Get notified by email when new papers are published related to ArchesClimate.