Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Temporal Neural Operator for Modeling Time-Dependent Physical Phenomena (2504.20249v1)

Published 28 Apr 2025 in cs.LG

Abstract: Neural Operators (NOs) are machine learning models designed to solve partial differential equations (PDEs) by learning to map between function spaces. Neural Operators such as the Deep Operator Network (DeepONet) and the Fourier Neural Operator (FNO) have demonstrated excellent generalization properties when mapping between spatial function spaces. However, they struggle in mapping the temporal dynamics of time-dependent PDEs, especially for time steps not explicitly seen during training. This limits their temporal accuracy as they do not leverage these dynamics in the training process. In addition, most NOs tend to be prohibitively costly to train, especially for higher-dimensional PDEs. In this paper, we propose the Temporal Neural Operator (TNO), an efficient neural operator specifically designed for spatio-temporal operator learning for time-dependent PDEs. TNO achieves this by introducing a temporal-branch to the DeepONet framework, leveraging the best architectural design choices from several other NOs, and a combination of training strategies including Markov assumption, teacher forcing, temporal bundling, and the flexibility to condition the output on the current state or past states. Through extensive benchmarking and an ablation study on a diverse set of example problems we demonstrate the TNO long range temporal extrapolation capabilities, robustness to error accumulation, resolution invariance, and flexibility to handle multiple input functions.

Summary

  • The paper introduces TNO, achieving accurate and resolution-invariant long-term predictions for time-dependent PDEs.
  • It employs a dual-branch architecture that combines a temporal branch with U-Net spatial processing to efficiently manage variable grid resolutions.
  • Experimental results in weather forecasting, climate modeling, and geologic carbon sequestration demonstrate superior accuracy and computational efficiency.

Neural Operators (NOs) like DeepONet and Fourier Neural Operator have shown promise in learning mappings between function spaces for solving Partial Differential Equations (PDEs), particularly for spatial variations. However, applying them effectively to time-dependent PDEs, especially for long-term predictions beyond the training data, has been challenging due to limitations in capturing temporal dynamics and error accumulation during autoregressive rollouts. The paper "Temporal Neural Operator for Modeling Time-Dependent Physical Phenomena" (2504.20249) introduces the Temporal Neural Operator (TNO) as a novel architecture specifically designed to address these limitations, enabling accurate and efficient spatio-temporal operator learning with strong temporal extrapolation capabilities and resolution invariance.

The core idea behind TNO is to learn the time evolution operator GΔt\mathcal{G}_{\Delta t} that maps a solution state at time tt, u(t,)u(t, \cdot), to a future state u(t+Δt,)u(t + \Delta t, \cdot). This is extended to map a history of LL past states Uhist(t)={u(t(l1)Δt,)}l=1L\mathbf{U}_{hist}(t) = \{u(t - (l-1)\Delta t, \cdot)\}_{l=1}^{L} to a bundle of KK future states Ufut(t)={u(t+kΔt,)}k=1K\mathbf{U}_{fut}(t) = \{u(t + k\Delta t, \cdot)\}_{k=1}^{K} using a learned operator GΔtLK\mathcal{G}_{\Delta t}^{L \rightarrow K}. The paper emphasizes that choosing L=1L=1 (Markov assumption) or L>1L>1 (memory) can be problem-specific, but using a temporal bundle of K>1K>1 predictions in a single forward pass is generally beneficial for efficiency and stability.

The TNO architecture builds upon the DeepONet framework by introducing a dedicated temporal branch (t-branch) alongside the traditional branch and trunk networks. The architecture is structured as follows:

  1. Input Lifting: Input functions or parameters vv are processed by a linear encoder PbP_b in the branch, and the history of solution states Uhist(t)\mathbf{U}_{hist}(t) is processed by a separate linear encoder PtbP_{tb} in the temporal branch. Both map their inputs into a latent space of dimension pp.
  2. Spatial Feature Processing: The latent representations from the branch (hb(v)h_b(v)) and t-branch ($h_{tb}(\mathbf{U}_{\text{hist}(t))$) are fed into separate U-Net architectures. To handle varying spatial resolutions, adaptive average pooling is applied before the U-Net, and bilinear upsampling after, restoring the original spatial resolution. This allows the model to generalize to different grid sizes without retraining.
  3. Coordinate Encoding (Trunk): The trunk network takes the spatio-temporal coordinates (t,x)(t, \mathbf{x}) as input and processes them through a Feed-Forward Neural Network (FFN) to produce coordinate-dependent features in the same latent space Rp\mathbb{R}^p.
  4. Combination and Projection: The outputs of the spatially processed branch features (U~b\tilde{U}_b), t-branch features (U~tb\tilde{U}_{tb}), and trunk features (tit_i) are combined using an element-wise (Hadamard) product along the latent dimension pp. This combined representation is then passed through a shared MLP decoder GG applied pointwise over the spatial domain to produce the final output $\widehat{\mathbf{U}_{\text{fut}(t)}$, which is the predicted sequence of KK future solution states over the spatial grid. The overall operation is:

    $\widehat{\mathbf{U}_{\text{fut}(t)(\mathbf{x}) = G\left( \tilde{U}_b(\mathbf{x}, t) \odot \tilde{U}_{tb}(\mathbf{x}, t) \odot t_i(\mathbf{x}, t) \right)}$

    The U-Net architecture details, including convolutional and transposed convolutional layers, kernel sizes, strides, activations (Leaky ReLU or SiLU), and batch normalization, are provided in Appendix A of the paper. Skip connections are used in the U-Nets to aid feature propagation.

For practical implementation, the paper highlights training strategies such as temporal bundling (K>1K>1 predictions per forward pass) to improve efficiency and stability, and teacher forcing during training to mitigate the accumulation of errors by using ground truth data to condition future predictions. During inference or "rollout," the model can use its own predictions autoregressively.

The paper demonstrates the TNO's capabilities across three real-world time-dependent physical phenomena:

  1. European Air Temperature Forecast (Weather Forecasting):
    • Problem: Forecasting daily air temperature over Europe using observational data from the E-OBS dataset [cornes2018ensemble], which includes noise, gaps, and different resolutions (0.25° and 0.1°).
    • Implementation: Trained with L=1,K=4L=1, K=4 (predicting 4 days ahead based on the current day) using daily temperature and pressure as inputs. Training used a combination of teacher forcing and fine-tuning phases. Masking handled missing data.
    • Application/Results: TNO achieved low MAE and RMSE on a 2023 test set, showing minimal error accumulation during rollouts (predicting 8 days ahead in two steps). Crucially, it demonstrated zero-shot super-resolution by maintaining high accuracy on the 0.1° test grid despite being trained only on the 0.25° grid. This showcases the resolution invariance enabled by the U-Net and adaptive pooling design. Ablation studies confirmed the critical contribution of the t-branch and U-Net blocks to performance and resolution invariance. Compared to DeepONet and Fourier-DeepONet variants, TNO showed superior accuracy and robustness on both coarse and fine grids (Figure 4). Computational performance is efficient, with training time around 3.63 seconds per epoch and 12.2 GB GPU memory usage (Table 1).
  2. Global Air Temperature (Climate Modeling):
    • Problem: Modeling 3D global air temperature evolution across atmospheric pressure levels efficiently.
    • Implementation: Used the NCEP/NCAR Reanalysis 1 dataset [kalnay2018ncep]. Addressed the 3D problem by treating it as a series of 2D spatial slices conditioned on the pressure level. The temporal branch received the temperature history, the branch received the pressure level (as an input variable), and the trunk received the spatio-temporal coordinates. Trained with L=K=365L=K=365 (predicting one year ahead) across a subset of pressure levels (12 out of 16). Levels were batched during training to improve efficiency and generalization.
    • Application/Results: Demonstrated long-term temporal extrapolation by forecasting global temperatures over a 5-year horizon (2019-2023) not seen in training (2010-2015). Achieved a mean relative L2L_2 error of 0.016. Also showed generalization across vertical pressure levels, making predictions accurately on levels held out during training. The approach is computationally efficient, modeling 3D dynamics using 2D operations, requiring only 7.5 GB GPU memory for the pressure buildup model in GCS (the paper doesn't explicitly state Global Temp memory, but it is likely comparable or less than GCS Pressure) and taking about 447 seconds per epoch for the GCS Pressure model (Global Temp was faster at 39s/epoch for a different configuration) (Table 1).
  3. Geologic Carbon Sequestration (GCS):
    • Problem: Predicting CO2_2 plume migration (saturation) and pressure buildup in subsurface formations, involving coupled multiphase flow under heterogeneous geological conditions.
    • Implementation: Used a benchmark GCS dataset [Wen2022U-FNOAnFlow]. Trained two separate TNOs (one for saturation, one for pressure). The branch processed input reservoir properties (permeability fields, porosity, scalar parameters), the t-branch processed the initial condition (saturation or pressure field), and the trunk processed the temporal grid. The spatial coordinates were included as features in the branch input. Trained on the first 1.8 years of simulation data (16 time steps) with L=1,K=3L=1, K=3. Tested on time snapshots up to 30 years (beyond the training horizon) on unseen geological realizations.
    • Application/Results: Showcased simultaneous generalization to new geological parameters and long-term temporal extrapolation (up to 30 years from a 1.8-year training horizon). Achieved low MAE for both saturation (within the plume area) and pressure buildup, with only slight degradation towards the end of the extrapolation period (Figure 5). This highlights TNO's ability to learn complex, coupled multiphysics dynamics and generalize under varying conditions and time scales. Memory usage was 4.3 GB for the saturation model and 7.5 GB for the pressure buildup model, with epoch times of 211s and 447s respectively (Table 1).

In summary, the TNO introduces a powerful and efficient neural operator architecture for time-dependent PDEs by incorporating a temporal branch, leveraging U-Net spatial processing with resolution invariance, and employing effective training strategies like temporal bundling and teacher forcing. Its demonstrated performance across diverse problems in weather, climate, and subsurface flow indicates its potential as a practical tool for rapid, accurate, and generalizable simulations in scientific and engineering applications, particularly where long-term forecasting and handling varied data resolutions are critical. The architecture is designed to be lightweight and trainable on readily available GPU hardware, making it accessible for practitioners. Potential limitations include potential error accumulation for extremely long, unassisted rollouts and the need for careful problem-specific configuration of hyperparameters (L,KL, K, network dimensions, etc.).