Papers
Topics
Authors
Recent
2000 character limit reached

FishGym Simulator: FSI & RL for Aquatic Robots

Updated 16 November 2025
  • FishGym Simulator is a high-performance simulation framework that integrates realistic fluid–structure interaction and reinforcement learning for aquatic biomechanics and underwater robotics.
  • It employs GPU-accelerated lattice–Boltzmann methods alongside immersed boundary models to accurately simulate the dynamics between articulated fish skeletons and viscous fluids.
  • The modular design supports diverse tasks such as cruising, path following, and schooling, enabling rapid policy optimization and reproducible experimental studies.

FishGym Simulator is a high-performance, physics-based simulation framework for the training and evaluation of fish-like robots and model organisms under two-way coupled fluid–structure interaction (FSI), with full reinforcement learning (RL) integration. Designed for the paper and optimization of underwater robotic controllers, biomechanics, and aquatic animal behavior, FishGym enables reproducible experiments involving articulated skeletons, realistic fluid forces, and sophisticated control policies, leveraging GPU-accelerated lattice–Boltzmann methods and immersed boundary models (Liu et al., 2022).

1. Numerical Fluid–Structure Interaction Modeling

FishGym’s core module simulates the dynamical interaction between an articulated fish skeleton and a viscous, incompressible fluid, governed by the full Navier–Stokes equations: ρ(tu+(u ⁣ ⁣)u)=p+μ2u+fext, ⁣ ⁣u=0\rho\Bigl(\partial_t \mathbf{u} + (\mathbf{u}\!\cdot\!\nabla)\mathbf{u}\Bigr) = -\nabla p + \mu \nabla^2\mathbf{u} + \mathbf{f}_{\rm ext}, \qquad \nabla\!\cdot\!\mathbf{u} = 0 where u\mathbf{u} is fluid velocity, pp the pressure, ρ\rho the density, μ\mu the viscosity, and fext\mathbf{f}_{\rm ext} external forces.

For the coupled articulated skeleton, the equations are discretized as: M(q)q¨+C(q,q˙)=τint+τfluid\mathbf{M}(\mathbf{q})\,\ddot{\mathbf{q}} + \mathbf{C}(\mathbf{q},\dot{\mathbf{q}}) = \boldsymbol{\tau}_{\rm int} + \boldsymbol{\tau}_{\rm fluid} where q\mathbf{q} are joint coordinates, M\mathbf{M} the mass matrix, C\mathbf{C} the Coriolis/centrifugal terms, τint\boldsymbol{\tau}_{\rm int} internal torques (including actuation), and τfluid\boldsymbol{\tau}_{\rm fluid} torques induced by hydrodynamic interaction.

The fluid is discretized via a GPU-accelerated lattice–Boltzmann scheme (LBM, typically D3Q19), while the fish surface is represented by Lagrangian immersed boundary (IB) markers. Two-way coupling is established by iterative enforcement of no-slip conditions, interpolating velocities from the Eulerian fluid grid to IB nodes and spreading corrective forces back to the fluid domain.

Boundary conditions are handled by maintaining a finite, moving computational domain (cube) that travels with the fish, using a non-equilibrium extrapolation at the cube faces. This design achieves unbounded-domain behavior with fixed compute resources (Liu et al., 2022). The core FSI cycle per time step is:

  1. Advect the domain origin and accumulate virtual inertial forces
  2. Collide–stream LBM update with IB- and frame-induced forces
  3. Interpolate fluid velocity at IB markers
  4. Compute and spread IB penalty forces
  5. Integrate skeleton dynamics subject to τfluid\boldsymbol{\tau}_{\rm fluid}

The alternative minimal-physics mode, as employed in some MuJoCo-based environments, represents hydrodynamics with per-link lumped models (added-mass, quadratic drag, optional lift/vortex shedding), trading first-principles realism for computational speed (Singh et al., 9 Nov 2025).

2. Articulated Fish Modeling and Kinematics

FishGym supports a variety of articulated morphologies, such as Koi-type skeletons with chain or tree topologies. The fish model consists of:

  • A set of rigid links (bones), each with assigned mass and inertia (URDF-style description)
  • Rotational joints (degrees of freedom), with position, velocity, and actuation state
  • Surface skin mesh (linear blend skinning) for geometrical fidelity in fluid coupling

Controllers actuate the fish by torque signals at the joints (σ\boldsymbol{\sigma}), with optional additional control channels such as buoyancy adjustment (Δv\Delta v). Passive joints may include PD-spring/damper torques (Liu et al., 2022, Singh et al., 9 Nov 2025).

Kinematics for specific tasks (e.g., hand-coded tail-beat patterns) are expressed as periodic functions: θtail(t)=θ0sin(ωt+ϕ)\theta_{\rm tail}(t) = \theta_0 \sin(\omega t + \phi) where θ0\theta_0 is amplitude, ω\omega beat frequency, and ϕ\phi phase.

3. Reinforcement Learning Environment Design

Environments are structured according to the RL paradigm, exposing observation and action spaces, episodic reward functions, and reset mechanisms suitable for both low-level continuous control and hierarchical macro-action schemes. Standard FishGym observation vectors include:

  • Joint angles q\mathbf{q} and velocities q˙\dot{\mathbf{q}} for all actuated and passive joints
  • Global position/orientation (e.g., x,y,zx, y, z, quaternion or Euler representation)
  • Body-frame linear and angular velocities
  • Task-dependent information (e.g., relative target position Δx,Δy,Δz\Delta x, \Delta y, \Delta z)

Actions may take the form of:

  • Continuous torque signals per actuated joint (e.g., for all or a subset of spine joints)
  • Discrete commands in hierarchical or macro-action tasks (e.g., “accel,” “decel,” or lateral line-triggered maneuvers) (Li et al., 2023)

Reward functions combine position, orientation, velocity, and energy-shaping terms. A common form is: rt=wgoalexp(αptp2)+woricos(ψtψ)wctrlut2r_t = w_{\rm goal} \cdot \exp(-\alpha\|\mathbf{p}_t-\mathbf{p}^*\|^2) + w_{\rm ori} \cdot \cos(\psi_t-\psi^*) - w_{\rm ctrl} \cdot \|\mathbf{u}_t\|^2 Objective is to maximize the expected return E[tγtrt]E[\sum_t \gamma^t r_t], under constraints including torque and joint limits, and episodic caps (Singh et al., 9 Nov 2025).

Specialized environments implement lateral-line-inspired sensing and macro-action abstractions, using additional neural networks for flow-field classification and discrete action selection (Li et al., 2023).

4. RL Algorithms, Network Architectures, and Training Protocols

FishGym natively integrates state-of-the-art deep RL algorithms, notably Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO), with GPU acceleration for both environment step and policy optimization (Liu et al., 2022).

Canonical network architectures are:

  • Actor and critic as multilayer perceptrons (MLPs) with 2 hidden layers, 256 units each, ReLU activations (common in physics-based RL)
  • Policy outputs continuous distributions (e.g., Gaussian for torques), or softmax over macro-actions
  • In some setups, input stacking is used: four past steps for predation (batch=64), two periods for Kármán-gait tasks (batch=128)

Hyperparameters documented for SAC include: learning rates 5×1045\times10^{-4}, γ=0.99\gamma=0.99, soft target update τ=0.01\tau=0.01, replay buffer 5×1055\times10^5 samples, entropy temperature α\alpha auto-tuned. PPO configurations follow (learning rate 3×1043\times10^{-4}, batch=2048, clip ϵ=0.2\epsilon=0.2, etc.) (Liu et al., 2022, Singh et al., 9 Nov 2025, Li et al., 2023).

Training typically runs for $1$–2×1072\times 10^7 timesteps (e.g., $1000$–$2000$ episodes × $50–1000$ steps per episode), with performance plateau reached in $1000$ episodes (FishGym benchmark), or as few as $35$ episodes for predation, $70$ for Kármán-gait in lateral-line setups (Li et al., 2023). Evaluation is interleaved every 10510^510610^6 steps.

5. Specialized Modules: Lateral-Line Sensing and Macro-Actions

Advances in FishGym-style simulation include explicit modeling of aquatic sensory systems. The lateral-line sensing module is implemented as follows:

  • Five discrete sensor probes are placed longitudinally on the fish flank, each sampling local velocity (ux,uy)(u_x, u_y), pressure pp, and total speed TT at every IB step
  • Sensor readings are concatenated to form a 20-dimensional vector
  • An offline-trained DNN classifier (architecture: 4 → 64 → 32 → 3 neurons) maps this vector to cavity flow field classes (e.g., background, vortex street regimes). Accuracy exceeds 95%95\% when trained with Adam, batch size $64$, $1000$ epochs
  • At runtime, the macro-action RL agent selects among swim frequency increments (“accel”), maintenance (“cruise”), or decrements (“decel”), gated by the classified flow state; frequency-range limits are enforced (Li et al., 2023)

This hybrid RL structure enables transfer learning and rapid adaptation in nonstationary flow fields and yields measurable improvements in generalization and sample efficiency.

6. Benchmark Tasks, Metrics, and Empirical Validation

FishGym supports multiple classes of tasks:

  • Cruising: reaching world targets in bounded/unbounded fluid domains
  • Path following: minimizing deviation p(t)ppath(t)\lVert \mathbf{p}(t) - \mathbf{p}_{\text{path}}(t)\rVert over pre-defined or curriculum-generated trajectories
  • Pose control: U-turn or reorientation maneuvers
  • Collective behaviors: schooling, with secondary fish exploiting leader’s wake (requiring full FSI for vortex formation)
  • Predation: point-to-point pursuit of a mobile target, requiring anticipatory behavior

Performance is quantified using:

  • Path precision: mean and standard deviation over 50 rollouts, with physics-based simulation achieving 0.03\sim 0.03 m in mean error (compared to $0.5$ m for empirical drag models)
  • Cruising efficiency: measured as time or mechanical work to reach a waypoint
  • Pose control: error metrics during rotational maneuvers
  • Energy consumption: sensitivity studies by varying reward coefficients; in Kármán-gait, empirical work reduction of 396%396\%1685%1685\% is reported relative to free-swim (Li et al., 2023)
  • Generalization: survival time and stability when facing unfamiliar flow conditions, with survival time increasing from <40T<40T to $100T$ (max) with lateral-line/macro integration

Notably, simulations produce fish trajectories and angle-of-attack distributions consistent with biological studies (mean 3.13\approx3.13^\circ, max up to 1515^\circ), and schooling scenarios exhibit hydrodynamic behaviors unavailable to single-fish or drag-based models (Liu et al., 2022, Li et al., 2023).

7. Implementation and Computational Performance

The FishGym platform is implemented with CUDA-accelerated fluid solvers and Python RL interfaces. Key computational characteristics include:

  • Typical LBM cube: 1003100^31000×5001000\times500 grid (5×105\sim 5\times10^510610^6 cells), 300\sim300 IB nodes
  • Environment throughput: 10310^310410^4 timesteps/sec on NVIDIA RTX A4000-class GPUs; single-episode time $10$–$100$ sec, depending on task settings (Li et al., 2023)
  • GPU RAM: 12\sim12 GB for medium-scale experiments (Titan Xp or above)
  • RL training: sample efficiency is high due to two-way GPU integration—convergence in 610\sim6–10 hours, millions of steps per session (Liu et al., 2022)
  • Environment instantiation, agent interaction, and training management are programmable via standard Python API

The modular design enables real-time visualization and accelerates both policy optimization and fundamental biomechanical studies. The combination of physics accuracy, RL benchmarks, and open-source design positions FishGym as a reference standard for research in aquatic robotics and computational ethology.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to FishGym Simulator.