Implicit Adaptive Fourier Neural Operator (IAFNO)

Updated 22 December 2025

The paper introduces IAFNO, which employs implicit iterations with adaptive spectral filtering to achieve stable, long-term predictions in turbulent 3D flows with minimal error growth.
IAFNO significantly reduces computational cost by cutting parameter count, memory usage, and wall-time compared to explicit methods, with speedups up to 4×.
IAFNO integrates patchwise self-attention and a residual iterative scheme to robustly capture spatiotemporal dynamics in high-dimensional PDE settings.

The Implicit Adaptive Fourier Neural Operator (IAFNO) is an advanced deep operator architecture for learning spatiotemporal mappings, especially for turbulent three-dimensional (3D) flows. Rooted in the adaptive Fourier neural operator (AFNO) framework, IAFNO employs repeated implicit iterations of a learned, adaptive spectral kernel, stabilized and regularized by a fixed-point structure and spectral sparsity constraints. When deployed as either a forecasting surrogate or the backbone of a diffusion-based denoiser network, IAFNO achieves high-fidelity long-term predictions, computational and memory efficiency, and robustness to error accumulation in high-dimensional 3D PDE settings (Jiang et al., 14 Dec 2025, Jiang et al., 22 Jan 2025).

1. Foundations and Architectural Principles

IAFNO extends the adaptive Fourier neural operator concept by combining patchwise self-attention ("token mixing") in Fourier space, adaptive spectral filtering, and an implicit residual iteration scheme. The operator maps an input field $v(x,0)$ , $x \in D \subset \mathbb{R}^3$ , through a sequence of $L$ implicit time steps, each involving $N$ explicit adaptive spectral layers. The core update reads

$v(x,(l+1)\Delta t) = v(x,l\Delta t) + \Delta t \cdot \mathcal{K}_N \circ \cdots \circ \mathcal{K}_1[v(x,l\Delta t)],$

where each $\mathcal{K}_n$ operates as a MLP-enhanced adaptive Fourier layer: $v_{n+1}(x,t) = \mathrm{MLP}\left[ v_n(x,t) + \mathcal{F}^{-1}(R_{\mathrm{IAFNO}} \cdot \mathcal{F}(v_n(\cdot, t)))(x) \right].$ The frequency-space adaptive kernel is dynamically parameterized as

$R_{\mathrm{IAFNO}} \cdot \mathcal{F}(v_n) = S_\lambda \big[ W_2 \sigma(W_1 \mathcal{F}(v_n) + b_1 ) + b_2 \big],$

where $W_1$ , $W_2$ , $b_1$ , $b_2$ are complex-valued learnable parameters, $\sigma$ is a nonlinearity (e.g., ReLU), and $S_\lambda$ is a soft-thresholding operator enforcing spectral sparsity (Jiang et al., 14 Dec 2025, Jiang et al., 22 Jan 2025).

2. Implicit Iteration and Adaptive Spectral Filtering

The "implicit" aspect of IAFNO refers to the fixed-point iteration scheme, analogous to integrating an ODE in depth: $\frac{dv}{d\tau} = \mathcal{K}(v), \quad \text{with} \quad v^{(l+1)} = v^{(l)} + \Delta t \mathcal{K}(v^{(l)}).$ This structure permits stable, arbitrarily deep operator networks without gradient vanishing/explosion, as each residual update only applies a bounded per-step correction. The "adaptive" spectral filtration is realized by the learned kernel $R_{\mathrm{IAFNO}}$ : the model learns, for each frequency, which Fourier modes to suppress, amplify, or preserve, dynamically focusing capacity on the energetic or coherent scales present in the data (Jiang et al., 22 Jan 2025).

The use of soft-thresholding promotes sparsity, mitigating overfitting and biasing the operator towards dominant physical scales. Empirically, the residual implicit iteration in IAFNO yields error growth rates of ≈0.1% per physical turnover time, substantially outperforming explicit architectures such as IUFNO, which exhibit increased error and instability at extended prediction horizons (Jiang et al., 22 Jan 2025).

3. Patchwise Mixing and Self-Attention

To address the cubic scaling in grid size ( $N=xyz$ ) inherent in 3D transformer or attention-based networks, IAFNO adopts a patch-embedding approach. The 3D input is partitioned into non-overlapping $p \times p \times p$ patches, flattened to tokens. Self-attention is then computed in this reduced token space, lowering complexity from $O(N^2)$ to $O((N/p^3)^2)$ and enabling tractable long-horizon learning for large domains. Attentional mixing is realized using standard query/key/value operations, with attention weights: $A = \mathrm{softmax}\left(\frac{Q K^\top}{\sqrt{d}}\right), \quad \mathrm{Att}(X) = A V$ where $Q, K, V$ are learned projections of the tokens (Jiang et al., 22 Jan 2025).

4. Integration in Diffusion–Autoregressive Modeling (DiAFNO)

IAFNO has been deployed as the denoiser core within the DiAFNO diffusion–autoregressive model, enabling conditional generation and long-term autoregressive prediction for 3D turbulent flows. In this context, IAFNO parameterizes the conditional diffusion denoiser $F_\theta$ , operating within the noise-prediction variant of diffusion model training: $\mathcal{L} = \mathbb{E}_{\sigma, x, \epsilon} \|F_\theta( c_{\mathrm{in}}(\sigma)x ; c_{\mathrm{noise}}(\sigma) ) - [x_0 - c_{\mathrm{skip}}(\sigma)x] / c_{\mathrm{out}}(\sigma) \|^2$ Notably, the DiAFNO framework leverages the residual structure of IAFNO for stability during the denoising process, enabling consistent reconstructions in both the physical and frequency domains over extended prediction sequences. During training and inference, previous states are concatenated as conditioning channels, and the EDM sampler with Heun's method is used for stochastic sampling (Jiang et al., 14 Dec 2025).

5. Computational Characteristics and Efficiency

IAFNO achieves high computational and memory efficiency through parameter sharing and the implicit iteration scheme. Unlike explicit layer stacking, which multiplies parameters and memory usage by the number of layers, IAFNO reuses the same adaptive spectral block across $L$ steps. Quantitative metrics, for forced HIT ( $L=20$ ):

IUFNO: 83 million parameters, 22.6 GB GPU, 4072 s/epoch train, 7.50 s (10-step inference)
IAFNO: 1.2 million parameters, 8.5 GB GPU, 1009 s/epoch train, 2.37 s (10-step inference)

Thus, IAFNO cuts the parameter count by up to 1/70, memory by 0.38, and wall-time by 0.25–0.32, with comparable spectral accuracy and long-term stability (Jiang et al., 22 Jan 2025). Training speedups of ≈4× relative to IUFNO have been recorded on modern GPUs. This efficiency enables practical deployment as a surrogate for classical large-eddy simulation (LES) approaches.

6. Empirical Performance Across Turbulence Benchmarks

In forced homogeneous isotropic turbulence (HIT), decaying HIT, and turbulent channel flows, IAFNO shows superior stability and fidelity relative to explicit operator methods and traditional LES:

Spectral accuracy: IAFNO recovers energy spectra $E(k)$ up to $k\approx 10$ with <2% error; explicit diffusion models and DSM systematically underpredict key scales.
RMS and turbulent statistics: Time-averaged velocities, vorticities, and Reynolds stresses remain within 1–5% of direct numerical simulation results over long rollouts ( $t/\tau \leq 50$ for HIT).
Robustness: IAFNO demonstrates no catastrophic drift or "blow-up" for prediction horizons 2× the training window, in contrast to explicit analogues (Jiang et al., 14 Dec 2025, Jiang et al., 22 Jan 2025).

In all cases, significant speed advantages over LES/DSM are reported, with inference runtimes for DiAFNO (with IAFNO core) being an order of magnitude lower in channel flow and close to parity for HIT, and with improved physical fidelity at both large and small scales.

7. Implications, Limitations, and Extensions

The implicit and adaptive spectral architecture of IAFNO confers stability, long-term accuracy, and computational tractability, making it suitable for high-dimensional PDE surrogate modeling and dense-grid turbulence forecasting (Jiang et al., 14 Dec 2025, Jiang et al., 22 Jan 2025). Potential extensions include:

Adapting the spectral kernel structure to unstructured or irregular domains using deformable Fourier kernels.
Integration of physical constraints, such as divergence-free conditions, directly into the adaptive kernel for stricter physics enforcement in incompressible flows.
Joint operator learning for multi-field and reacting flows by expanding the adaptive block to multi-field couplings.
Physics-informed diffusion losses that penalize Navier–Stokes residuals during denoising.

Primary limitations are the data requirements (owing to the diffusion framework) and reliance on periodic or block-periodic domains for FFT efficiency. Future work targets geometry-generalization, constraint-integrated layers, and reduced-data training via physics-consistent preconditioning.

Key References

Y. Jiang et al., "Integrating Fourier Neural Operator with Diffusion Model for Autoregressive Predictions of Three-dimensional Turbulence" (Jiang et al., 14 Dec 2025)
Y. Jiang et al., "An Implicit Adaptive Fourier Neural Operator for Long-term Predictions of Three-dimensional Turbulence" (Jiang et al., 22 Jan 2025)