FluidFormer: Neural Fluid Simulation Architecture
- FluidFormer is a neural architecture for particle-based fluid simulation that combines SPH-inspired continuous convolution with transformer-style self-attention to capture both local details and global interactions.
- It employs a dual mechanism via the Fluid Attention Block to fuse cascaded continuous convolutions with multi-head self-attention, improving stability and reducing error accumulation over time.
- FluidFormer demonstrates state-of-the-art performance on benchmarks like Liquid3D and Fueltank, showing significant reductions in Chamfer Distance and maximum density errors compared to previous methods.
FluidFormer is a neural architecture for particle-based fluid simulation that introduces a dual mechanism of continuous convolution and global self-attention, enabling accurate local feature extraction and stabilization of complex fluid phenomena via long-range dependency modeling. Developed specifically for the requirements of continuous fluid simulation, FluidFormer achieves state-of-the-art accuracy and temporal stability by unifying Smoothed Particle Hydrodynamics (SPH)–inspired local computation with transformer-style attention across all fluid particles (Wang et al., 3 Aug 2025).
1. Background and Motivation
Traditional particle-based fluid simulation frameworks, such as SPH, approximate field quantities through localized, kernel-weighted sums over neighbors. While recent learning-based approaches adopt similar paradigms by leveraging local graphs or continuous convolutions, they exhibit two primary limitations: (1) persistent error accumulation over time due to the purely local nature of their interactions (manifesting as density drift, particle clustering, or spurious splashing), and (2) an inability to model long-range effects—such as overall recirculation or global wave propagation. FluidFormer addresses these issues by introducing a local-global processing hierarchy within each neural block, allowing the system to both preserve the benefits of SPH’s physical continuity and model collective phenomena that require a global context (Wang et al., 3 Aug 2025).
2. Governing Equations and SPH Context
FluidFormer operates in the context of particle-based approximations to the incompressible Navier–Stokes equations in their Lagrangian form. For particles indexed by with positions and velocities , the motion is governed by: where comprises pressure and viscous terms, while denotes external and boundary forces.
In SPH, key quantities are estimated by kernel summation over local neighborhoods: with a compact-support kernel of radius . FluidFormer generalizes these local operations to a learnable, differentiable context using continuous convolution modules (Wang et al., 3 Aug 2025).
3. Fluid Attention Block (FAB) and Feature Fusion
At the core of FluidFormer is the Fluid Attention Block (FAB), which bifurcates feature processing into two streams—a local branch and a global branch:
- Local branch: Employs cascaded continuous convolution (CConv) layers, each expressed as
where 0 is a learnable gating function, 1 is an MLP-based kernel, and 2 denotes a spherical neighborhood.
- Global branch: Implements multi-head self-attention, applying 3D rotary positional encoding (3D-RoPE) to the query/key projections:
3
where 4 is a rotation matrix encoding 3D spatial relationships.
- Fusion: Local (5) and global (6) outputs are adaptively merged by a soft-attention gate:
7
with 8 the sigmoid and 9 a learnable scalar.
This architectural choice allows joint modeling of small-scale and holistic phenomena at every layer (Wang et al., 3 Aug 2025).
4. Dual-Pipeline Transformer Architecture
FluidFormer features a dual-pipeline structure for dynamics refinement:
- CConv path: Applies standard continuous convolutions to update particle features.
- ASCC path: Uses Antisymmetric Continuous Convolution (ASCC) to encode momentum conservation by leveraging a sign-inverted kernel 0:
1
Each refinement layer fuses the outputs of these two pipelines through a FAB and a residual connection. Type-aware embedding, achieved via iterative FABs, prepares the input by merging representations of fluid and boundary particles.
Position corrections are predicted by: 2 with scaling 3 (Wang et al., 3 Aug 2025).
5. Training Objectives and Practical Implementation
FluidFormer is optimized to encourage temporal stability by predicting two subsequent time steps and minimizing the loss: 4 where each 5 uses a neighbor-aware weighting: 6 with 7 as the actual neighbor count, 8 the mean neighbor count (fixed at 40), and 9.
Key implementation details include:
- Up to 20,000 particles,
- Neighborhood radius 0,
- 4 refinement layers,
- 4 attention heads per FAB,
- Embedding dimension 128,
- Adam optimizer (1, 2, weight decay 3),
- Learning rate schedule halving at specified iterations (total 60k),
- Utilization of FlashAttention to reduce quadratic attention memory requirements (Wang et al., 3 Aug 2025).
6. Experimental Evaluation and Ablation Studies
FluidFormer is assessed on the Liquid3D (complex water-drop in a circular groove) and Fueltank (violent sloshing in tanks of increasing complexity) benchmarks, with metrics including Chamfer Distance (CD), Earth Mover’s Distance (EMD), 4-frame sequence error (5-SE), maximum density error (MDE), and inference time.
| Benchmark / Metric | Prior Best (CD) | FluidFormer (CD) | Prior (MDE) | FluidFormer (MDE) |
|---|---|---|---|---|
| Liquid3D, 1-step | 0.520 mm | 0.418 mm | — | — |
| Liquid3D, 2-step | 1.454 mm | 1.152 mm | — | — |
| Fueltank I | 1.322 mm | 1.012 mm | 0.014 | 0.008 |
FluidFormer demonstrates substantial reductions in one-step and two-step Chamfer Distance versus PioneerNet and other prior art. Qualitative results indicate the elimination of artifact phenomena such as unphysical clustering and splashing associated with local-only models.
Ablation studies validate the essential contributions of each architectural component:
- Removing the Global Feature Extractor increases 6-SE from 24.442 mm to 41.024 mm.
- Removing the Local (CConv) branch causes catastrophic simulation failure (7-SE = 75.073 mm).
- Omitting Type-aware Embedding degrades boundary handling (8-SE = 84.462 mm).
- Removing ASCC increases 9-SE to 93.524 mm.
- Disabling 3D-RoPE positional encoding weakens performance (0-SE = 29.131 mm).
These results underscore the necessity of both local and global information flows, physics-based constraints, and robust positional encoding (Wang et al., 3 Aug 2025).
7. Limitations and Outlook
The principal limitation of FluidFormer is the 1 computational complexity incurred by all-to-all particle attention, which is partially alleviated using FlashAttention; however, this remains a scalability constraint. The architecture currently targets single-phase fluids. Potential extensions include sparse or hierarchical global attention, multi-phase fluid dynamics, and coupling with rigid or deformable solids. It is anticipated that the local-global paradigm established by FluidFormer will inform the design of future learned fluid solvers in both research and application domains (Wang et al., 3 Aug 2025).