- The paper introduces TNO, achieving accurate and resolution-invariant long-term predictions for time-dependent PDEs.
- It employs a dual-branch architecture that combines a temporal branch with U-Net spatial processing to efficiently manage variable grid resolutions.
- Experimental results in weather forecasting, climate modeling, and geologic carbon sequestration demonstrate superior accuracy and computational efficiency.
Neural Operators (NOs) like DeepONet and Fourier Neural Operator have shown promise in learning mappings between function spaces for solving Partial Differential Equations (PDEs), particularly for spatial variations. However, applying them effectively to time-dependent PDEs, especially for long-term predictions beyond the training data, has been challenging due to limitations in capturing temporal dynamics and error accumulation during autoregressive rollouts. The paper "Temporal Neural Operator for Modeling Time-Dependent Physical Phenomena" (2504.20249) introduces the Temporal Neural Operator (TNO) as a novel architecture specifically designed to address these limitations, enabling accurate and efficient spatio-temporal operator learning with strong temporal extrapolation capabilities and resolution invariance.
The core idea behind TNO is to learn the time evolution operator GΔt that maps a solution state at time t, u(t,⋅), to a future state u(t+Δt,⋅). This is extended to map a history of L past states Uhist(t)={u(t−(l−1)Δt,⋅)}l=1L to a bundle of K future states Ufut(t)={u(t+kΔt,⋅)}k=1K using a learned operator GΔtL→K. The paper emphasizes that choosing L=1 (Markov assumption) or L>1 (memory) can be problem-specific, but using a temporal bundle of K>1 predictions in a single forward pass is generally beneficial for efficiency and stability.
The TNO architecture builds upon the DeepONet framework by introducing a dedicated temporal branch (t-branch) alongside the traditional branch and trunk networks.
The architecture is structured as follows:
- Input Lifting: Input functions or parameters v are processed by a linear encoder Pb in the branch, and the history of solution states Uhist(t) is processed by a separate linear encoder Ptb in the temporal branch. Both map their inputs into a latent space of dimension p.
- Spatial Feature Processing: The latent representations from the branch (hb(v)) and t-branch ($h_{tb}(\mathbf{U}_{\text{hist}(t))$) are fed into separate U-Net architectures. To handle varying spatial resolutions, adaptive average pooling is applied before the U-Net, and bilinear upsampling after, restoring the original spatial resolution. This allows the model to generalize to different grid sizes without retraining.
- Coordinate Encoding (Trunk): The trunk network takes the spatio-temporal coordinates (t,x) as input and processes them through a Feed-Forward Neural Network (FFN) to produce coordinate-dependent features in the same latent space Rp.
- Combination and Projection: The outputs of the spatially processed branch features (U~b), t-branch features (U~tb), and trunk features (ti) are combined using an element-wise (Hadamard) product along the latent dimension p. This combined representation is then passed through a shared MLP decoder G applied pointwise over the spatial domain to produce the final output $\widehat{\mathbf{U}_{\text{fut}(t)}$, which is the predicted sequence of K future solution states over the spatial grid. The overall operation is:
$\widehat{\mathbf{U}_{\text{fut}(t)(\mathbf{x}) = G\left( \tilde{U}_b(\mathbf{x}, t) \odot \tilde{U}_{tb}(\mathbf{x}, t) \odot t_i(\mathbf{x}, t) \right)}$
The U-Net architecture details, including convolutional and transposed convolutional layers, kernel sizes, strides, activations (Leaky ReLU or SiLU), and batch normalization, are provided in Appendix A of the paper. Skip connections are used in the U-Nets to aid feature propagation.
For practical implementation, the paper highlights training strategies such as temporal bundling (K>1 predictions per forward pass) to improve efficiency and stability, and teacher forcing during training to mitigate the accumulation of errors by using ground truth data to condition future predictions. During inference or "rollout," the model can use its own predictions autoregressively.
The paper demonstrates the TNO's capabilities across three real-world time-dependent physical phenomena:
- European Air Temperature Forecast (Weather Forecasting):
- Problem: Forecasting daily air temperature over Europe using observational data from the E-OBS dataset [cornes2018ensemble], which includes noise, gaps, and different resolutions (0.25° and 0.1°).
- Implementation: Trained with L=1,K=4 (predicting 4 days ahead based on the current day) using daily temperature and pressure as inputs. Training used a combination of teacher forcing and fine-tuning phases. Masking handled missing data.
- Application/Results: TNO achieved low MAE and RMSE on a 2023 test set, showing minimal error accumulation during rollouts (predicting 8 days ahead in two steps). Crucially, it demonstrated zero-shot super-resolution by maintaining high accuracy on the 0.1° test grid despite being trained only on the 0.25° grid. This showcases the resolution invariance enabled by the U-Net and adaptive pooling design. Ablation studies confirmed the critical contribution of the t-branch and U-Net blocks to performance and resolution invariance. Compared to DeepONet and Fourier-DeepONet variants, TNO showed superior accuracy and robustness on both coarse and fine grids (Figure 4). Computational performance is efficient, with training time around 3.63 seconds per epoch and 12.2 GB GPU memory usage (Table 1).
- Global Air Temperature (Climate Modeling):
- Problem: Modeling 3D global air temperature evolution across atmospheric pressure levels efficiently.
- Implementation: Used the NCEP/NCAR Reanalysis 1 dataset [kalnay2018ncep]. Addressed the 3D problem by treating it as a series of 2D spatial slices conditioned on the pressure level. The temporal branch received the temperature history, the branch received the pressure level (as an input variable), and the trunk received the spatio-temporal coordinates. Trained with L=K=365 (predicting one year ahead) across a subset of pressure levels (12 out of 16). Levels were batched during training to improve efficiency and generalization.
- Application/Results: Demonstrated long-term temporal extrapolation by forecasting global temperatures over a 5-year horizon (2019-2023) not seen in training (2010-2015). Achieved a mean relative L2 error of 0.016. Also showed generalization across vertical pressure levels, making predictions accurately on levels held out during training. The approach is computationally efficient, modeling 3D dynamics using 2D operations, requiring only 7.5 GB GPU memory for the pressure buildup model in GCS (the paper doesn't explicitly state Global Temp memory, but it is likely comparable or less than GCS Pressure) and taking about 447 seconds per epoch for the GCS Pressure model (Global Temp was faster at 39s/epoch for a different configuration) (Table 1).
- Geologic Carbon Sequestration (GCS):
- Problem: Predicting CO2 plume migration (saturation) and pressure buildup in subsurface formations, involving coupled multiphase flow under heterogeneous geological conditions.
- Implementation: Used a benchmark GCS dataset [Wen2022U-FNOAnFlow]. Trained two separate TNOs (one for saturation, one for pressure). The branch processed input reservoir properties (permeability fields, porosity, scalar parameters), the t-branch processed the initial condition (saturation or pressure field), and the trunk processed the temporal grid. The spatial coordinates were included as features in the branch input. Trained on the first 1.8 years of simulation data (16 time steps) with L=1,K=3. Tested on time snapshots up to 30 years (beyond the training horizon) on unseen geological realizations.
- Application/Results: Showcased simultaneous generalization to new geological parameters and long-term temporal extrapolation (up to 30 years from a 1.8-year training horizon). Achieved low MAE for both saturation (within the plume area) and pressure buildup, with only slight degradation towards the end of the extrapolation period (Figure 5). This highlights TNO's ability to learn complex, coupled multiphysics dynamics and generalize under varying conditions and time scales. Memory usage was 4.3 GB for the saturation model and 7.5 GB for the pressure buildup model, with epoch times of 211s and 447s respectively (Table 1).
In summary, the TNO introduces a powerful and efficient neural operator architecture for time-dependent PDEs by incorporating a temporal branch, leveraging U-Net spatial processing with resolution invariance, and employing effective training strategies like temporal bundling and teacher forcing. Its demonstrated performance across diverse problems in weather, climate, and subsurface flow indicates its potential as a practical tool for rapid, accurate, and generalizable simulations in scientific and engineering applications, particularly where long-term forecasting and handling varied data resolutions are critical. The architecture is designed to be lightweight and trainable on readily available GPU hardware, making it accessible for practitioners. Potential limitations include potential error accumulation for extremely long, unassisted rollouts and the need for careful problem-specific configuration of hyperparameters (L,K, network dimensions, etc.).