LibTorch-based Coupling for Weather Prediction

Updated 27 January 2026

The method integrates TorchScript models directly into legacy Fortran codes using a C++ shared library and ISO_C_BINDING, bypassing Python overhead.
It optimizes data exchange and memory management by reshaping Fortran arrays and reusing buffers, achieving up to 8× speedup in operational settings.
Stability safeguards like output renormalization and clipping maintain physical realism, ensuring reliable long-term weather forecasts during extended integrations.

A LibTorch-based coupling method enables direct and efficient integration of TorchScript-serialized deep learning models into high-performance scientific software ecosystems, particularly those implemented in Fortran. In the context of operational numerical weather prediction, such methods facilitate the replacement of computational bottlenecks—such as physical radiation schemes—with neural network emulators without spawning extraneous Python processes. This approach is exemplified in the embedding of a deep-learning-based radiation parameterization within the China Meteorological Administration’s Global Forecast System (CMA-GFS), yielding significant computational acceleration while maintaining accuracy and long-term stability (Jing et al., 20 Jan 2026).

1. Architectural Overview and Software Integration

The LibTorch-based coupling method for CMA-GFS addresses the intrinsic challenges of integrating modern ML components—trained and archived as TorchScript modules—into predominantly Fortran-based legacy codes. The workflow begins with off-line ML model training, archiving the inference graph using TorchScript. The serialized model is then compiled into a C++ shared library utilizing LibTorch, exposing C-ABI interface functions for initialization and inference. These functions are invoked from the Fortran host physics via ISO_C_BINDING, completely bypassing Python and minimizing runtime dependencies.

The directory structure is as follows:

Directory	Purpose	Key Files
include/	C++–Fortran glue-interface headers	torch_adapter.hpp
src/	C++ routines for ML inference	torch_adapter.cpp, ml_inference.cpp
fortran/	Fortran stubs, physics wrappers	rrtmg_ml.f90
lib/TorchScript/	Serialized TorchScript archive	rrtmg_ml.pt
build/	Out-of-source CMake build	(build artifacts)

CMake configuration includes linking against Torch libraries and integrating interface headers. The Fortran build further links against the ML library and includes necessary module interfaces for robust and portable coupling (Jing et al., 20 Jan 2026).

2. Data Exchange, Array Layout, and Memory Management

The numerical weather prediction model dispatches batches of $n_\text{col}$ vertical atmospheric columns per physics step to the ML emulator. Each column is characterized by inputs stacked into 22 (shortwave, SW) or 20 (longwave, LW) channels across 89 vertical levels:

For SW: $n_\text{feat} = 22 \times 89 = 1958$ floats per column.
Outputs (fluxes/heating rates): $n_\text{out} = 6 \times 89 = 534$ floats per column.

Given Fortran’s column-major array layout, raw data is reshaped into two-dimensional $(n_\text{col}, n_\text{ch}\cdot n_\text{lev})$ arrays. These are passed to C++ as contiguous float32 memory buffers without explicit copy or transpose, matched via torch::from_blob with the appropriate strides.

For inference, buffer reuse is enforced: host memory for inputs and outputs is allocated once and maintained throughout multi-day integrations. When GPU inference is selected at initialization, paired CUDA tensors are created, and host–device transfer proceeds via pinned memory to minimize copy latency, achieving zero per-timestep heap allocation (Jing et al., 20 Jan 2026).

3. C++/Fortran Interface and Inference Control

The bridge between Fortran and C++ is realized via pure C interface definitions in torch_adapter.hpp:

extern "C" {
  void rrtmg_ml_init(const char* model_path, int use_gpu_flag);
  void rrtmg_ml_infer(int ncol, const float* features, float* outputs);
}

The associated Fortran module binds these procedures with type-safe signatures via ISO_C_BINDING. Initialization is performed at host-model startup, specifying the device (CPU/CUDA). For every physics step, three calls manage the workflow:

1
2
3

call rrtmg_ml_pack_inputs(ncol, model_state, features)
call rrtmg_ml_infer(ncol, features, outputs)
call rrtmg_ml_unpack_outputs(ncol, outputs, rad_state)

If the ML-based radiation scheme is enabled, the traditional RRTMG Fortran call is bypassed.

Within the C++ backend, inference disables autograd with torch::NoGradGuard, and input data is wrapped, forwarded through module.forward, and output tensors are copied back to host memory by direct pointer transfer. GPU-mode inference leverages resource pre-initialization and multi-threading to maximize throughput (Jing et al., 20 Jan 2026).

4. Stability Safeguards and Physical Constraints

To guarantee consistency and prevent unphysical outputs during long-term integration, output vectors from the ML emulator are renormalized and clipped in physical space:

$F_{\mathrm{pred}} = \mathcal{N}_{\text{out}}^{-1}(f_{\mathrm{net}})$

where $\mathcal{N}_{\text{out}}^{-1}$ denotes re-scaling by per-channel standard deviations and means from training.

Each channel is constrained to $F_{\mathrm{pred}} \in [F_{\min}, F_{\max}]$ , maintaining positive flux directions and bounding maximum values (e.g., $2000\,\mathrm{W\,m}^{-2}$ ) to suppress outliers.

A secondary consistency check recomputes the heating rate $\dot{T}_i$ from the predicted flux divergence:

$\dot T_i = -\frac{g}{c_p}\frac{F^{\downarrow}_i + F^{\uparrow}_i - F^{\downarrow}_{i+1} - F^{\uparrow}_{i+1}}{p_i - p_{i+1}}$

If the discrepancy between this derived value and the network-predicted $\dot{T}_i$ exceeds 10% of $\dot{T}_i$ , the physically derived value overwrites the network output. This safeguard reduced the crash rate from approximately 18% to zero over ten-day integrations, demonstrating critical importance for operational stability (Jing et al., 20 Jan 2026).

5. Performance Optimization and Computational Profiling

The coupling method is engineered for high-end computational efficiency. Batch inference is performed on up to 512 columns per call, amortizing TorchScript graph construction and improving device utilization. CPU threading (torch::set_num_threads) is aligned with the host’s OpenMP settings; GPU mode leverages pinned memory via torch::empty_pinned to decrease host-device transfer time by approximately 30%.

Profiling on a 12-core Intel Xeon node with a single V100 GPU (10-day, 12.5-km grid, $\sim$ 60 TB I/O) yields the following operation breakdown:

Step	% of Time
Data packing (Fortran→C)	8
Host–Device (H2D/D2H) transfer	12
Inference (module.forward+eval)	68
Unpacking & clipping	12

Empirically, the ML-based emulator attains a speedup of approximately $8\times$ over standard Fortran RRTMG, with end-to-end coupling overhead kept modest. Device context is pre-warmed at launch, and all memory allocations occur before the forecast time loop (Jing et al., 20 Jan 2026).

6. Operational Robustness and Integration Outcomes

This LibTorch-based coupling strategy provides a fully encapsulated Fortran interface, abstracting C++ interoperation, array-stride adaptation, and device-management entirely from the host model. The resulting system functions as a “drop-in” replacement for the RRTMG radiation scheme, supporting extended (10-day) integrations with zero runtime crashes and no degradation in physical forecast realism. The hybrid Fortran–C++–TorchScript design has proven compatible with operational requirements for real-time reforecasting, supporting stringent production constraints and large-scale data throughput. The approach is broadly extensible to other physics parameterizations and numerical model architectures using similar Fortran-centric codebases (Jing et al., 20 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Machine learning based radiative parameterization scheme and its performance in operational reforecast experiments (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LibTorch-based Coupling Method.