FD-Bench: Modular Benchmarking for Fluid Simulation

Updated 28 January 2026

FD-Bench is a modular benchmarking framework for data-driven fluid simulation that decomposes neural PDE solvers into spatial, temporal, and loss modules for clear, fair evaluation.
The framework provides a unified, open-source codebase with 10 flow datasets and 85 baseline models, enabling direct comparisons between machine learning and classical CFD solvers.
FD-Bench supports detailed performance metrics and ablation studies, offering actionable insights into architecture trade-offs such as self-attention, Fourier, and graph-based approaches.

FD-Bench refers to three distinct, state-of-the-art benchmarking frameworks in different computational research domains: (1) a modular and fair benchmark for data-driven fluid simulation (Wang et al., 25 May 2025), (2) an automated benchmarking framework for digital forensic tool validation (“AutoDFBench 1.0”) (Wickramasekara et al., 18 Dec 2025), and (3) a full-duplex benchmarking pipeline for spoken dialogue systems (Peng et al., 25 Jul 2025). Each incarnation provides a domain-specific methodology for reproducible, granular, and extensible evaluation, addressing core limitations in respective subfields. The subsequent sections focus on the comprehensive and rigorous FD-Bench for data-driven fluid simulation (Wang et al., 25 May 2025), with brief notes on the other variants at the end.

1. Motivation and Design Principles

FD-Bench was introduced to unify and standardize the assessment of neural PDE solvers for fluid dynamics, motivated by three persistent limitations: fragmented PDE datasets, entangled architecture innovations (spatial, temporal, loss), and the absence of systematic evaluation protocols—especially in comparison to classical CFD solvers. Its design is characterized as fair, modular, comprehensive, and reproducible. The framework addresses these gaps with four primary contributions:

Modular decomposition of solver architectures into spatial, temporal, and loss axes, enabling direct ablation and apples-to-apples evaluation.
A framework for direct comparison to traditional numerical solvers at matched error regimes.
Fine-grained generalization analysis over spatial resolution, initial/boundary conditions, and rollout horizons.
An open-source, extensible codebase encompassing 10 flow datasets and 85 baseline re-implementations under a unified API (Wang et al., 25 May 2025).

2. Modular Architecture: Spatial, Temporal, and Loss Decomposition

A salient innovation is the modularization of neural PDE solvers into three orthogonal components:

Spatial Module ( $\mathcal{S}_\theta$ ): Encodes flow fields at fixed time into latent representations $z \in \mathbb{R}^D$ . FD-Bench enumerates:

Fourier/spectral mixing: $z = \mathcal{F}^{-1}(\phi_\theta(\mathbf{k}) \odot \mathcal{F}(u))$
Self-attention: $z = \mathrm{softmax}(\frac{1}{\sqrt{d_k}}(uW_Q)(uW_K)^\top)(uW_V)$
Spatial convolutions, graph convolutions, reduced-order (POD), and implicit neural representations

Temporal Module ( $\mathcal{T}_\theta$ ): Aggregates flows across time into a dynamic or trajectory representation:

Autoregression (AR), next-step rollout, temporal bundling (joint prediction of $k$ steps), temporal self-attention, neural ODEs

Loss Module: Defines the training and evaluation objective, including:

Physical MSE: $\mathcal{L}_{\mathrm{MSE}} = \frac{1}{N} \sum_i \|\hat{u}(\bm{x}_i, t) - u(\bm{x}_i, t)\|^2$
Diffusion denoising, flow matching losses, and PINN residuals

By systematically combining 5 spatial × 5 temporal × 4 loss options, FD-Bench defines a grid of 85 baseline models, ensuring methodological consistency and allowing module-level ablation (Wang et al., 25 May 2025).

3. Scope: Datasets and Problem Coverage

FD-Bench encompasses 10 canonical 2D flow scenarios spanning a range of PDE structures and complexity:

Scenario	Data Type	Spatial Resolutions
Incompressible Navier–Stokes (NS)	Grid (vorticity form)	$128^2$ – $512^2$
Compressible NS	Grid	$128^2$ – $512^2$
Stochastic NS (white noise)	Grid	$128^2$ – $512^2$
Kolmogorov flow (forced)	Grid	$128^2$ – $512^2$
Diffusion–Reaction (FitzHugh–Nagumo)	Grid	$128^2$ – $512^2$
Taylor–Green vortex	SPH particle	Varies
Reverse Poiseuille flow	SPH particle	Varies
Advection (linear PDE)	Grid	$128^2$ – $512^2$
Lid-driven cavity	SPH particle	Varies
Burgers’ equation	Grid	$128^2$ – $512^2$

Each dataset provides high-resolution (up to $512^2$ ), multi-condition simulations (Reynolds number, viscosity, Mach number), and at least 100–1000 time steps. Train/val/test splits are fixed for comparability (Wang et al., 25 May 2025).

4. Baseline Models, Comparison Methodology, and Classical Solver Integration

FD-Bench standardizes the re-implementation of 85 baselines across all major neural PDE solver families and modules:

Spatial: Fourier-based (FNO, AFNO, Geo-FNO), graph-based (MeshGraphNets, GNS), convolutional (U-Net, CNN), self-attention (Transolver, HAMLET), implicit neural solvers (DINo), ROM, MLP-based architectures (DeepONet).
Temporal: Autoregressive, next-step, temporal bundling, ODEs, attention across time.
Loss: MSE, residual, diffusion/noise, flow matching losses.

Traditional CFD solvers are included via pseudo-spectral or finite-volume solvers (e.g., semi-implicit Heun for incompressible NS), calibrated to match the one-step error regime of neural baselines, enabling normalized, direct accuracy/performance comparisons.

5. Evaluation Protocols and Metrics

Training is performed on 8×A6000 GPUs with Adam optimizer, cosine annealing schedule, and grid-searched hyper-parameters. Evaluation splits are fixed per dataset. The following metrics are standardized:

RMSE: $\mathrm{RMSE} = \sqrt{\frac{1}{NT} \sum_{i, t} \|\hat{u}_i(t) - u_i(t)\|^2}$
Normalized RMSE (nRMSE): RMSE relative to the norm of the true field
fRMSE: Frequency-band MSE in low/mid/high Fourier bands
Efficiency: Inference time and memory (GFLOPs, GB)

Generalization analysis is performed along three axes: (a) initial-condition shifts (zero-shot turbulence/forcing), (b) resolution transfer (e.g., training on $128^2$ and testing on $256^2$ ), and (c) long-horizon rollouts. Quantitatively, self-attention + temporal bundling + MSE achieves the lowest RMSE for compressible NS (0.057) (Wang et al., 25 May 2025).

6. Empirical Results and Leaderboard Findings

FD-Bench ranks baseline and hybrid configurations for each scenario by RMSE, fRMSE, and runtime. Key results:

Compressible NS: Self-attention + bundling + MSE is optimal
Diffusion–Reaction: Fourier + next-step + MSE ranks highest
Kolmogorov flow: Self-attention + bundling excels

Empirical findings:

Self-attention modules provide the highest accuracy at higher computational cost
Fourier modules yield strong accuracy vs. efficiency trade-offs
Temporal bundling outperforms AR/ODE for fixed compute
Neural ODEs, although suboptimal at fixed compute, support irregular sampling
Classical solvers are consistently outperformed in speed and accuracy by neural FNOs (10–50× speedups)
Eulerian (grid/mesh) discretizations exhibit superior long-horizon performance compared to Lagrangian (particle-based) methods

The public leaderboard and modular codebase support reproducibility, extension, and robust evaluation standards (Wang et al., 25 May 2025).

7. Future Directions and Impact

FD-Bench sets a foundational reproducibility and comparability standard for data-driven fluid dynamics. Recommendations include:

Hybridization of Fourier priors with self-attention modules
Efficient temporal models (sparse/linear attention)
Hierarchical Eulerian–Lagrangian couplings to mitigate roll-out error
Broader integration of stochastic and flow-matching losses for uncertainty quantification

The framework enables principled selection, ablation, and improvement of machine learning-based PDE solvers. Its public codebase and extensible API facilitate experimental rigor and extension by the broader community.

Notes on Alternate Frameworks Named "FD-Bench"

AutoDFBench 1.0 (Wickramasekara et al., 18 Dec 2025): A modular benchmarking suite for digital forensic tool validation, integrating five CFTT areas. Uses deterministic, per-test-case F1-score-based evaluation, a RESTful API, and comprehensive ground truth for reproducible, cross-tool assessment.
FD-Bench (Full-Duplex Dialogue) (Peng et al., 25 Jul 2025): A benchmarking pipeline for evaluating spoken dialogue systems in full-duplex scenarios. Features simulation pipelines including GPT-4o dialogue synthesis, TTS with noise control, and interruption/timing-aware metrics such as SRR, SIR, SRIR, EIR, and various latency/quality measures.

Each variant is independently developed; context must be established for unambiguous reference.

Markdown Upgrade to Chat

References (3)

FD-Bench: A Modular and Fair Benchmark for Data-driven Fluid Simulation (2025)

AutoDFBench 1.0: A Benchmarking Framework for Digital Forensic Tool Testing and Generated Code Evaluation (2025)

FD-Bench: A Full-Duplex Benchmarking Pipeline Designed for Full Duplex Spoken Dialogue Systems (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FD-Bench.

FD-Bench: Modular Benchmarking for Fluid Simulation

1. Motivation and Design Principles

2. Modular Architecture: Spatial, Temporal, and Loss Decomposition

3. Scope: Datasets and Problem Coverage

4. Baseline Models, Comparison Methodology, and Classical Solver Integration

5. Evaluation Protocols and Metrics

6. Empirical Results and Leaderboard Findings

7. Future Directions and Impact

Notes on Alternate Frameworks Named "FD-Bench"

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

FD-Bench: Modular Benchmarking for Fluid Simulation

1. Motivation and Design Principles

2. Modular Architecture: Spatial, Temporal, and Loss Decomposition

3. Scope: Datasets and Problem Coverage

4. Baseline Models, Comparison Methodology, and Classical Solver Integration

5. Evaluation Protocols and Metrics

6. Empirical Results and Leaderboard Findings

7. Future Directions and Impact

Notes on Alternate Frameworks Named "FD-Bench"

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research