Branched Fourier Neural Operators

Updated 23 January 2026

BFNO is a neural operator that uses branched Fourier-domain convolutions to approximate mappings between function spaces with improved modeling of nonlocal dependencies.
Its architecture integrates spectral convolution with pointwise linear operations via dynamic multi-kernel filtering, offering flexibility for time-dependent and multibody simulation tasks.
Empirical results demonstrate significant gains in convergence efficiency and predictive accuracy in tasks ranging from image classification to complex dynamical system simulations.

Branched Fourier Neural Operators (BFNO) are operator learning architectures designed to approximate mappings between function spaces via learned Fourier-domain convolutions, with applications ranging from neural ordinary differential equations (NODEs) to fast simulation of multibody dynamical systems. BFNO generalizes the Fourier Neural Operator by introducing architectural branching and dynamic multi-kernel filtering, yielding greater flexibility and improved inductive bias for complex, nonlocal dependencies—especially in settings where traditional neural network modules prove inadequate.

1. Foundations and Motivation

The right-hand side of a neural ordinary differential equation (NODE),

$\frac{d\mathbf{h}(t)}{dt} = f(\mathbf{h}(t), t; \theta_f),$

naturally represents a differential operator acting on the state trajectory $\mathbf{h}(t)$ , suggesting that methodologies from operator learning can enhance expressive power and global modeling capacity. Traditional approaches parameterize $f$ as a small multilayer perceptron (MLP) or convolutional network (CNN), but such architectures are typically limited to local, rigid interactions.

Neural operators, including the Fourier Neural Operator (FNO), are designed to learn mappings between infinite-dimensional function spaces, such as those arising in the solution operators of partial differential equations (PDEs). FNO realizes these via global convolution kernels represented in the spectral (Fourier) domain. By interpreting $f$ in NODEs as an operator between function spaces, one can leverage the generalized, nonlocal inductive bias of neural operators. BFNO is engineered specifically to serve as a replacement for $f$ within NODEs and other time-dependent operator learning contexts (Cho et al., 2023, Wang et al., 2024).

2. Core Architectural Elements

BFNO layers couple a spectral (Fourier-domain) convolution branch with a pointwise linear branch, aggregating the outputs with a learned mixing module and nonlinearity. This branching structure is parameterized as follows.

At the $k$ -th layer, define an intermediate feature $\mathbf{g}_k$ . The BFNO update is: $\mathbf{g}_{k+1} = \sigma\Bigl(\mathcal{F}^{-1}\bigl(\rho(\mathcal{F}(\mathbf{g}_k))\bigr) + \mathbf{W} \mathbf{g}_k\Bigr)$ where:

$\mathcal{F}$ , $\mathcal{F}^{-1}$ : (discrete) Fourier/inverse Fourier transform,
$\rho$ : dynamic global convolution via mixing multiple parallel Fourier filters,
$\mathbf{W}$ : learnable pointwise linear map,
$\sigma$ : activation function (e.g., ReLU or ELU).

The global convolution $\rho$ operates as follows: $\mathbf{O}_i = \mathbf{R}_i \odot \mathcal{F}(\mathbf{g}_k),\quad 1\le i\le L$

$\rho (\mathcal{F}(\mathbf{g}_k)) = FC(\mathbf{O}_1, \ldots, \mathbf{O}_L)$

where $\{\mathbf{R}_i\}$ are learned Fourier-domain filters, $FC$ is a fully connected mixing network, and $L$ is the number of branches. In some applications, explicit separate branches target physically distinct subsystems (e.g., car body/bogie in railway vehicle modeling (Wang et al., 2024)), with each branch employing different spectral truncation or filtering strategies.

The output of $N$ stacked BFNO layers, possibly following an encoder, is mapped by a decoder to produce the final approximation to the desired operator's output.

3. Operator Learning Formulation

BFNO is structurally tailored for both time-dependent and multibody dynamical systems. In multibody simulation, inputs $a=(p, f)$ (physical parameters and excitations) are mapped to outputs $u(\cdot)$ (time-dependent responses) via a nonlinear operator $\mathcal{G}: \mathcal{A} \to \mathcal{U}$ , where $\mathcal{U}$ is a function space over time. The Fourier integral layers in BFNO perform global spectral convolutions: $(\mathcal{K} v)(x) = \int_D \kappa(x, y) v(y) \, dy \equiv \mathcal{F}^{-1}[R(\xi) \cdot T_K(\mathcal{F}[v])(\xi)]$ with $R(\xi)$ a learned spectral filter, $T_K$ truncation to $K$ lowest frequencies, and $D$ the time domain (Wang et al., 2024).

Distinct branches (with respective truncations, e.g., $K_{car}=100$ , $K_{bogie}=500$ ) allow subsystems characterized by different spectral content to be modeled without over- or underfitting. This design is realized by applying $L$ Fourier layers per branch, followed by individual multi-layer perceptrons (MLPs) and final aggregation.

4. Training, Implementation, and Computational Aspects

BFNO-based models are compatible with standard ODE solvers and backpropagation via the adjoint sensitivity method, with memory cost $\mathcal{O}(1)$ . The standard Dormand–Prince (DOPRI) adaptive Runge–Kutta solver is used for integration, and BFNO parameters are trained by minimizing application-appropriate losses (cross-entropy for classification, continuous normalizing flow (CNF) log-likelihood for generation, relative $L_2$ error for regression).

Each BFNO layer incurs the cost of two FFTs per feature channel ( $\mathcal{O}(C\, n\log n)$ ), pointwise multiplications, and fully connected mixing. Practical network configurations typically use $N=2$ –3 layers and $L=2$ –3 branches, maintaining parameter and runtime efficiency comparable to small CNN architectures (Cho et al., 2023). Empirical results indicate BFNO-based models often require fewer ODE function evaluations (NFEs) to reach target accuracy, thus achieving improved convergence and computational efficiency.

In the railway application, BFNO exhibited orders-of-magnitude speedup versus traditional simulation (e.g., 3.7 s vs. 1200 s for MATLAB MBS) and demonstrated grid-invariance with respect to time discretization, allowing predictions at novel time-steps without retraining (Wang et al., 2024).

5. Experimental Outcomes and Benchmarks

General Machine Learning Tasks

In image classification (MNIST, CIFAR-10, CIFAR-100, STL-10), BFNO-NODE achieves accuracy gains over NODE baselines (AdamNODE, ANODE, HBNODE), e.g., on CIFAR-100 improving from $\sim$ 0.24 to 0.29 and on STL-10 from $\sim$ 0.37 to 0.45. For instance:

Dataset	Best Baseline	BFNO-NODE
CIFAR-10	0.6264±0.0015	0.6289±0.0054
CIFAR-100	0.2405±0.0049	0.2890±0.0094
STL-10	$\sim0.37$	0.4455±0.0029

For time-series, such as HumanActivity and PhysioNet, similar improvements in accuracy and AUROC are observed (0.874 vs. 0.859 and 0.852 vs. 0.853, respectively), with BFNO models demonstrating reduced variance and faster convergence. In generative modeling, replacing NODE right-hand sides in continuous normalizing flows by BFNO yields improvements, e.g., MNIST from 0.97 to 0.88 bits/dim and CIFAR-10 from 3.38 to 3.33.

Ablation studies confirm the necessity of BFNO’s branching and multi-kernel design: performance peaks at $L=2$ branches, with drops for $L=1$ or $L>3$ ; substituting vanilla FNO or AFNO substantially degrades general learning performance (Cho et al., 2023).

Dynamical System Simulation

In railway vehicle-track coupled systems, BFNO realized a 64% reduction in relative loss (rLSE) on lateral car-body acceleration versus CNN-GRU baselines, and the predicted frequency spectra matched ground truth up to the primary frequency range (0–50 Hz). Pearson correlation coefficients were high (≈0.98 for car body, ≈0.94 for bogie), and inference is near-instantaneous once trained (Wang et al., 2024).

6. Theoretical Properties and Generalization

BFNO inherits from neural operators an “infinite-dimensional” inductive bias, modeling mappings not merely between finite vectors but between functions. Compared to vanilla FNO, BFNO avoids rigid low-pass filtering through its branched, dynamic, multi-kernel construction and learned mixing. It can be interpreted as a data-driven approximation to the Green’s function of the underlying differential operator (Cho et al., 2023).

The ability to configure branch-specific spectral truncations allows accurate modeling of subsystems with distinct dynamic characteristics, enabling application to broad multiphysics and multibody domains with component-wise spectral diversity (e.g., fluid–structure interaction, composite material dynamics) (Wang et al., 2024).

7. Limitations and Prospective Directions

Direct insertion of off-the-shelf FNO or AFNO modules into NODEs has been found to hurt performance; tailored operator layer designs such as BFNO are necessary to avoid deleterious spectral or nonlocal biases in general ML or non-PDE tasks. While FFT operations impart overhead, reductions in NFE balance runtime in practice.

Potential extensions include:

Applying BFNO within controlled or stochastic differential equation solvers and higher-order ODE architectures,
Utilizing BFNO as a surrogate model in physics-informed learning or PDE solvers,
Adapting the multi-kernel spectral convolution paradigm to Vision Transformers or Graph Neural Operators, and
Exploiting the grid-invariance property for cross-resolution dynamical characterization (Cho et al., 2023, Wang et al., 2024).

BFNO thus contributes a versatile and theoretically principled approach for operator learning in both continuous-depth machine learning systems and physical simulation domains.

Markdown Report Issue Upgrade to Chat

References (2)

Operator-learning-inspired Modeling of Neural Ordinary Differential Equations (2023)

Estimation of railway vehicle response for track geometry evaluation using branch Fourier neural operator (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Branched Fourier Neural Operators (BFNO).