FishGym Simulator: FSI & RL for Aquatic Robots
- FishGym Simulator is a high-performance simulation framework that integrates realistic fluid–structure interaction and reinforcement learning for aquatic biomechanics and underwater robotics.
- It employs GPU-accelerated lattice–Boltzmann methods alongside immersed boundary models to accurately simulate the dynamics between articulated fish skeletons and viscous fluids.
- The modular design supports diverse tasks such as cruising, path following, and schooling, enabling rapid policy optimization and reproducible experimental studies.
FishGym Simulator is a high-performance, physics-based simulation framework for the training and evaluation of fish-like robots and model organisms under two-way coupled fluid–structure interaction (FSI), with full reinforcement learning (RL) integration. Designed for the paper and optimization of underwater robotic controllers, biomechanics, and aquatic animal behavior, FishGym enables reproducible experiments involving articulated skeletons, realistic fluid forces, and sophisticated control policies, leveraging GPU-accelerated lattice–Boltzmann methods and immersed boundary models (Liu et al., 2022).
1. Numerical Fluid–Structure Interaction Modeling
FishGym’s core module simulates the dynamical interaction between an articulated fish skeleton and a viscous, incompressible fluid, governed by the full Navier–Stokes equations: where is fluid velocity, the pressure, the density, the viscosity, and external forces.
For the coupled articulated skeleton, the equations are discretized as: where are joint coordinates, the mass matrix, the Coriolis/centrifugal terms, internal torques (including actuation), and torques induced by hydrodynamic interaction.
The fluid is discretized via a GPU-accelerated lattice–Boltzmann scheme (LBM, typically D3Q19), while the fish surface is represented by Lagrangian immersed boundary (IB) markers. Two-way coupling is established by iterative enforcement of no-slip conditions, interpolating velocities from the Eulerian fluid grid to IB nodes and spreading corrective forces back to the fluid domain.
Boundary conditions are handled by maintaining a finite, moving computational domain (cube) that travels with the fish, using a non-equilibrium extrapolation at the cube faces. This design achieves unbounded-domain behavior with fixed compute resources (Liu et al., 2022). The core FSI cycle per time step is:
- Advect the domain origin and accumulate virtual inertial forces
- Collide–stream LBM update with IB- and frame-induced forces
- Interpolate fluid velocity at IB markers
- Compute and spread IB penalty forces
- Integrate skeleton dynamics subject to
The alternative minimal-physics mode, as employed in some MuJoCo-based environments, represents hydrodynamics with per-link lumped models (added-mass, quadratic drag, optional lift/vortex shedding), trading first-principles realism for computational speed (Singh et al., 9 Nov 2025).
2. Articulated Fish Modeling and Kinematics
FishGym supports a variety of articulated morphologies, such as Koi-type skeletons with chain or tree topologies. The fish model consists of:
- A set of rigid links (bones), each with assigned mass and inertia (URDF-style description)
- Rotational joints (degrees of freedom), with position, velocity, and actuation state
- Surface skin mesh (linear blend skinning) for geometrical fidelity in fluid coupling
Controllers actuate the fish by torque signals at the joints (), with optional additional control channels such as buoyancy adjustment (). Passive joints may include PD-spring/damper torques (Liu et al., 2022, Singh et al., 9 Nov 2025).
Kinematics for specific tasks (e.g., hand-coded tail-beat patterns) are expressed as periodic functions: where is amplitude, beat frequency, and phase.
3. Reinforcement Learning Environment Design
Environments are structured according to the RL paradigm, exposing observation and action spaces, episodic reward functions, and reset mechanisms suitable for both low-level continuous control and hierarchical macro-action schemes. Standard FishGym observation vectors include:
- Joint angles and velocities for all actuated and passive joints
- Global position/orientation (e.g., , quaternion or Euler representation)
- Body-frame linear and angular velocities
- Task-dependent information (e.g., relative target position )
Actions may take the form of:
- Continuous torque signals per actuated joint (e.g., for all or a subset of spine joints)
- Discrete commands in hierarchical or macro-action tasks (e.g., “accel,” “decel,” or lateral line-triggered maneuvers) (Li et al., 2023)
Reward functions combine position, orientation, velocity, and energy-shaping terms. A common form is: Objective is to maximize the expected return , under constraints including torque and joint limits, and episodic caps (Singh et al., 9 Nov 2025).
Specialized environments implement lateral-line-inspired sensing and macro-action abstractions, using additional neural networks for flow-field classification and discrete action selection (Li et al., 2023).
4. RL Algorithms, Network Architectures, and Training Protocols
FishGym natively integrates state-of-the-art deep RL algorithms, notably Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO), with GPU acceleration for both environment step and policy optimization (Liu et al., 2022).
Canonical network architectures are:
- Actor and critic as multilayer perceptrons (MLPs) with 2 hidden layers, 256 units each, ReLU activations (common in physics-based RL)
- Policy outputs continuous distributions (e.g., Gaussian for torques), or softmax over macro-actions
- In some setups, input stacking is used: four past steps for predation (batch=64), two periods for Kármán-gait tasks (batch=128)
Hyperparameters documented for SAC include: learning rates , , soft target update , replay buffer samples, entropy temperature auto-tuned. PPO configurations follow (learning rate , batch=2048, clip , etc.) (Liu et al., 2022, Singh et al., 9 Nov 2025, Li et al., 2023).
Training typically runs for $1$– timesteps (e.g., $1000$–$2000$ episodes × $50–1000$ steps per episode), with performance plateau reached in $1000$ episodes (FishGym benchmark), or as few as $35$ episodes for predation, $70$ for Kármán-gait in lateral-line setups (Li et al., 2023). Evaluation is interleaved every – steps.
5. Specialized Modules: Lateral-Line Sensing and Macro-Actions
Advances in FishGym-style simulation include explicit modeling of aquatic sensory systems. The lateral-line sensing module is implemented as follows:
- Five discrete sensor probes are placed longitudinally on the fish flank, each sampling local velocity , pressure , and total speed at every IB step
- Sensor readings are concatenated to form a 20-dimensional vector
- An offline-trained DNN classifier (architecture: 4 → 64 → 32 → 3 neurons) maps this vector to cavity flow field classes (e.g., background, vortex street regimes). Accuracy exceeds when trained with Adam, batch size $64$, $1000$ epochs
- At runtime, the macro-action RL agent selects among swim frequency increments (“accel”), maintenance (“cruise”), or decrements (“decel”), gated by the classified flow state; frequency-range limits are enforced (Li et al., 2023)
This hybrid RL structure enables transfer learning and rapid adaptation in nonstationary flow fields and yields measurable improvements in generalization and sample efficiency.
6. Benchmark Tasks, Metrics, and Empirical Validation
FishGym supports multiple classes of tasks:
- Cruising: reaching world targets in bounded/unbounded fluid domains
- Path following: minimizing deviation over pre-defined or curriculum-generated trajectories
- Pose control: U-turn or reorientation maneuvers
- Collective behaviors: schooling, with secondary fish exploiting leader’s wake (requiring full FSI for vortex formation)
- Predation: point-to-point pursuit of a mobile target, requiring anticipatory behavior
Performance is quantified using:
- Path precision: mean and standard deviation over 50 rollouts, with physics-based simulation achieving m in mean error (compared to $0.5$ m for empirical drag models)
- Cruising efficiency: measured as time or mechanical work to reach a waypoint
- Pose control: error metrics during rotational maneuvers
- Energy consumption: sensitivity studies by varying reward coefficients; in Kármán-gait, empirical work reduction of – is reported relative to free-swim (Li et al., 2023)
- Generalization: survival time and stability when facing unfamiliar flow conditions, with survival time increasing from to $100T$ (max) with lateral-line/macro integration
Notably, simulations produce fish trajectories and angle-of-attack distributions consistent with biological studies (mean , max up to ), and schooling scenarios exhibit hydrodynamic behaviors unavailable to single-fish or drag-based models (Liu et al., 2022, Li et al., 2023).
7. Implementation and Computational Performance
The FishGym platform is implemented with CUDA-accelerated fluid solvers and Python RL interfaces. Key computational characteristics include:
- Typical LBM cube: – grid (– cells), IB nodes
- Environment throughput: – timesteps/sec on NVIDIA RTX A4000-class GPUs; single-episode time $10$–$100$ sec, depending on task settings (Li et al., 2023)
- GPU RAM: GB for medium-scale experiments (Titan Xp or above)
- RL training: sample efficiency is high due to two-way GPU integration—convergence in hours, millions of steps per session (Liu et al., 2022)
- Environment instantiation, agent interaction, and training management are programmable via standard Python API
The modular design enables real-time visualization and accelerates both policy optimization and fundamental biomechanical studies. The combination of physics accuracy, RL benchmarks, and open-source design positions FishGym as a reference standard for research in aquatic robotics and computational ethology.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free