FishGym Simulator: FSI & RL for Aquatic Robots

Updated 16 November 2025

FishGym Simulator is a high-performance simulation framework that integrates realistic fluid–structure interaction and reinforcement learning for aquatic biomechanics and underwater robotics.
It employs GPU-accelerated lattice–Boltzmann methods alongside immersed boundary models to accurately simulate the dynamics between articulated fish skeletons and viscous fluids.
The modular design supports diverse tasks such as cruising, path following, and schooling, enabling rapid policy optimization and reproducible experimental studies.

FishGym Simulator is a high-performance, physics-based simulation framework for the training and evaluation of fish-like robots and model organisms under two-way coupled fluid–structure interaction (FSI), with full reinforcement learning (RL) integration. Designed for the paper and optimization of underwater robotic controllers, biomechanics, and aquatic animal behavior, FishGym enables reproducible experiments involving articulated skeletons, realistic fluid forces, and sophisticated control policies, leveraging GPU-accelerated lattice–Boltzmann methods and immersed boundary models (Liu et al., 2022).

1. Numerical Fluid–Structure Interaction Modeling

FishGym’s core module simulates the dynamical interaction between an articulated fish skeleton and a viscous, incompressible fluid, governed by the full Navier–Stokes equations: $\rho\Bigl(\partial_t \mathbf{u} + (\mathbf{u}\!\cdot\!\nabla)\mathbf{u}\Bigr) = -\nabla p + \mu \nabla^2\mathbf{u} + \mathbf{f}_{\rm ext}, \qquad \nabla\!\cdot\!\mathbf{u} = 0$ where $\mathbf{u}$ is fluid velocity, $p$ the pressure, $\rho$ the density, $\mu$ the viscosity, and $\mathbf{f}_{\rm ext}$ external forces.

For the coupled articulated skeleton, the equations are discretized as: $\mathbf{M}(\mathbf{q})\,\ddot{\mathbf{q}} + \mathbf{C}(\mathbf{q},\dot{\mathbf{q}}) = \boldsymbol{\tau}_{\rm int} + \boldsymbol{\tau}_{\rm fluid}$ where $\mathbf{q}$ are joint coordinates, $\mathbf{M}$ the mass matrix, $\mathbf{C}$ the Coriolis/centrifugal terms, $\boldsymbol{\tau}_{\rm int}$ internal torques (including actuation), and $\boldsymbol{\tau}_{\rm fluid}$ torques induced by hydrodynamic interaction.

The fluid is discretized via a GPU-accelerated lattice–Boltzmann scheme (LBM, typically D3Q19), while the fish surface is represented by Lagrangian immersed boundary (IB) markers. Two-way coupling is established by iterative enforcement of no-slip conditions, interpolating velocities from the Eulerian fluid grid to IB nodes and spreading corrective forces back to the fluid domain.

Boundary conditions are handled by maintaining a finite, moving computational domain (cube) that travels with the fish, using a non-equilibrium extrapolation at the cube faces. This design achieves unbounded-domain behavior with fixed compute resources (Liu et al., 2022). The core FSI cycle per time step is:

Advect the domain origin and accumulate virtual inertial forces
Collide–stream LBM update with IB- and frame-induced forces
Interpolate fluid velocity at IB markers
Compute and spread IB penalty forces
Integrate skeleton dynamics subject to $\boldsymbol{\tau}_{\rm fluid}$

The alternative minimal-physics mode, as employed in some MuJoCo-based environments, represents hydrodynamics with per-link lumped models (added-mass, quadratic drag, optional lift/vortex shedding), trading first-principles realism for computational speed (Singh et al., 9 Nov 2025).

2. Articulated Fish Modeling and Kinematics

FishGym supports a variety of articulated morphologies, such as Koi-type skeletons with chain or tree topologies. The fish model consists of:

A set of rigid links (bones), each with assigned mass and inertia (URDF-style description)
Rotational joints (degrees of freedom), with position, velocity, and actuation state
Surface skin mesh (linear blend skinning) for geometrical fidelity in fluid coupling

Controllers actuate the fish by torque signals at the joints ( $\boldsymbol{\sigma}$ ), with optional additional control channels such as buoyancy adjustment ( $\Delta v$ ). Passive joints may include PD-spring/damper torques (Liu et al., 2022, Singh et al., 9 Nov 2025).

Kinematics for specific tasks (e.g., hand-coded tail-beat patterns) are expressed as periodic functions: $\theta_{\rm tail}(t) = \theta_0 \sin(\omega t + \phi)$ where $\theta_0$ is amplitude, $\omega$ beat frequency, and $\phi$ phase.

3. Reinforcement Learning Environment Design

Environments are structured according to the RL paradigm, exposing observation and action spaces, episodic reward functions, and reset mechanisms suitable for both low-level continuous control and hierarchical macro-action schemes. Standard FishGym observation vectors include:

Joint angles $\mathbf{q}$ and velocities $\dot{\mathbf{q}}$ for all actuated and passive joints
Global position/orientation (e.g., $x, y, z$ , quaternion or Euler representation)
Body-frame linear and angular velocities
Task-dependent information (e.g., relative target position $\Delta x, \Delta y, \Delta z$ )

Actions may take the form of:

Continuous torque signals per actuated joint (e.g., for all or a subset of spine joints)
Discrete commands in hierarchical or macro-action tasks (e.g., “accel,” “decel,” or lateral line-triggered maneuvers) (Li et al., 2023)

Reward functions combine position, orientation, velocity, and energy-shaping terms. A common form is: $r_t = w_{\rm goal} \cdot \exp(-\alpha\|\mathbf{p}_t-\mathbf{p}^*\|^2) + w_{\rm ori} \cdot \cos(\psi_t-\psi^*) - w_{\rm ctrl} \cdot \|\mathbf{u}_t\|^2$ Objective is to maximize the expected return $E[\sum_t \gamma^t r_t]$ , under constraints including torque and joint limits, and episodic caps (Singh et al., 9 Nov 2025).

Specialized environments implement lateral-line-inspired sensing and macro-action abstractions, using additional neural networks for flow-field classification and discrete action selection (Li et al., 2023).

4. RL Algorithms, Network Architectures, and Training Protocols

FishGym natively integrates state-of-the-art deep RL algorithms, notably Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO), with GPU acceleration for both environment step and policy optimization (Liu et al., 2022).

Canonical network architectures are:

Actor and critic as multilayer perceptrons (MLPs) with 2 hidden layers, 256 units each, ReLU activations (common in physics-based RL)
Policy outputs continuous distributions (e.g., Gaussian for torques), or softmax over macro-actions
In some setups, input stacking is used: four past steps for predation (batch=64), two periods for Kármán-gait tasks (batch=128)

Hyperparameters documented for SAC include: learning rates $5\times10^{-4}$ , $\gamma=0.99$ , soft target update $\tau=0.01$ , replay buffer $5\times10^5$ samples, entropy temperature $\alpha$ auto-tuned. PPO configurations follow (learning rate $3\times10^{-4}$ , batch=2048, clip $\epsilon=0.2$ , etc.) (Liu et al., 2022, Singh et al., 9 Nov 2025, Li et al., 2023).

Training typically runs for $1$– $2\times 10^7$ timesteps (e.g., $1000$–$2000$ episodes × $50–1000$ steps per episode), with performance plateau reached in $1000$ episodes (FishGym benchmark), or as few as $35$ episodes for predation, $70$ for Kármán-gait in lateral-line setups (Li et al., 2023). Evaluation is interleaved every $10^5$ – $10^6$ steps.

5. Specialized Modules: Lateral-Line Sensing and Macro-Actions

Advances in FishGym-style simulation include explicit modeling of aquatic sensory systems. The lateral-line sensing module is implemented as follows:

Five discrete sensor probes are placed longitudinally on the fish flank, each sampling local velocity $(u_x, u_y)$ , pressure $p$ , and total speed $T$ at every IB step
Sensor readings are concatenated to form a 20-dimensional vector
An offline-trained DNN classifier (architecture: 4 → 64 → 32 → 3 neurons) maps this vector to cavity flow field classes (e.g., background, vortex street regimes). Accuracy exceeds $95\%$ when trained with Adam, batch size $64$, $1000$ epochs
At runtime, the macro-action RL agent selects among swim frequency increments (“accel”), maintenance (“cruise”), or decrements (“decel”), gated by the classified flow state; frequency-range limits are enforced (Li et al., 2023)

This hybrid RL structure enables transfer learning and rapid adaptation in nonstationary flow fields and yields measurable improvements in generalization and sample efficiency.

6. Benchmark Tasks, Metrics, and Empirical Validation

FishGym supports multiple classes of tasks:

Cruising: reaching world targets in bounded/unbounded fluid domains
Path following: minimizing deviation $\lVert \mathbf{p}(t) - \mathbf{p}_{\text{path}}(t)\rVert$ over pre-defined or curriculum-generated trajectories
Pose control: U-turn or reorientation maneuvers
Collective behaviors: schooling, with secondary fish exploiting leader’s wake (requiring full FSI for vortex formation)
Predation: point-to-point pursuit of a mobile target, requiring anticipatory behavior

Performance is quantified using:

Path precision: mean and standard deviation over 50 rollouts, with physics-based simulation achieving $\sim 0.03$ m in mean error (compared to $0.5$ m for empirical drag models)
Cruising efficiency: measured as time or mechanical work to reach a waypoint
Pose control: error metrics during rotational maneuvers
Energy consumption: sensitivity studies by varying reward coefficients; in Kármán-gait, empirical work reduction of $396\%$ – $1685\%$ is reported relative to free-swim (Li et al., 2023)
Generalization: survival time and stability when facing unfamiliar flow conditions, with survival time increasing from $<40T$ to $100T$ (max) with lateral-line/macro integration

Notably, simulations produce fish trajectories and angle-of-attack distributions consistent with biological studies (mean $\approx3.13^\circ$ , max up to $15^\circ$ ), and schooling scenarios exhibit hydrodynamic behaviors unavailable to single-fish or drag-based models (Liu et al., 2022, Li et al., 2023).

7. Implementation and Computational Performance

The FishGym platform is implemented with CUDA-accelerated fluid solvers and Python RL interfaces. Key computational characteristics include:

Typical LBM cube: $100^3$ – $1000\times500$ grid ( $\sim 5\times10^5$ – $10^6$ cells), $\sim300$ IB nodes
Environment throughput: $10^3$ – $10^4$ timesteps/sec on NVIDIA RTX A4000-class GPUs; single-episode time $10$–$100$ sec, depending on task settings (Li et al., 2023)
GPU RAM: $\sim12$ GB for medium-scale experiments (Titan Xp or above)
RL training: sample efficiency is high due to two-way GPU integration—convergence in $\sim6–10$ hours, millions of steps per session (Liu et al., 2022)
Environment instantiation, agent interaction, and training management are programmable via standard Python API

The modular design enables real-time visualization and accelerates both policy optimization and fundamental biomechanical studies. The combination of physics accuracy, RL benchmarks, and open-source design positions FishGym as a reference standard for research in aquatic robotics and computational ethology.

PDF Markdown Chat (Pro)

References (3)

FishGym: A High-Performance Physics-based Simulation Framework for Underwater Robot Learning (2022)

Underactuated Biomimetic Autonomous Underwater Vehicle for Ecosystem Monitoring (2025)

A numerical simulation method of fish adaption behavior based on deep reinforcement learning and fluid-structure coupling-realization of some lateral line functions (2023)

Follow Topic

Get notified by email when new papers are published related to FishGym Simulator.