Sim2Swim: Aquatic Sim-to-Real Methods

Updated 4 July 2026

Sim2Swim is a family of simulation-to-real frameworks in aquatic robotics that links swimmer modeling, system identification, and control deployment.
It integrates diverse methods—from reduced-order MuJoCo digital twins to zero-shot deep reinforcement learning and differentiable fluid–structure interaction simulations.
These approaches enable rapid calibration, robust control transfer, and efficient simulation across different swimmer dynamics and fluid regimes.

Searching arXiv for papers on “Sim2Swim” to ground the article in the relevant literature. Sim2Swim is a designation used in recent aquatic-robotics literature for simulation-to-real pipelines that link swimmer modeling, system identification, and downstream control. In one prominent formulation, it denotes a MuJoCo-based digital twin for a tendon-driven fish robot built from a simplified, stateless hydrodynamics model identified from two real-world swimming trajectories (Michelis et al., 26 Feb 2026). In another, it names a zero-shot deep-reinforcement-learning velocity controller for holonomic autonomous underwater vehicles (AUVs) trained with domain randomization and massively parallelized simulation in less than three minutes (Fosso et al., 9 Dec 2025). Related work also uses the term for differentiable fluid–structure simulators and finite-element ALE frameworks for swimmer dynamics (Nava et al., 2022, Landeghem et al., 2024). Taken together, these usages suggest a broader sim-to-real research motif for aquatic locomotion rather than a single canonical algorithm.

1. Scope and variants of the term

Across the papers considered here, Sim2Swim refers to several closely related but technically distinct frameworks. They share an emphasis on underwater locomotion, simulation efficiency, and transfer to physical systems, but they differ substantially in vehicle class, dynamical assumptions, and intended downstream task.

Framework	System	Central formulation
"Simple Models, Real Swimming" (Michelis et al., 26 Feb 2026)	tendon-driven robotic fish	stateless five-parameter MuJoCo fluid model
"Sim2Swim: Zero-Shot Velocity Control for Agile AUV Maneuvering in 3 Minutes" (Fosso et al., 9 Dec 2025)	holonomic AUV	zero-shot PPO controller with domain randomization
"Fast Aquatic Swimmer Optimization with Differentiable Projective Dynamics and Neural Network Hydrodynamic Models" (Nava et al., 2022)	2D carangiform swimmer	DiffPD + HydroNet differentiable FSI
"Towards a computational framework using finite element methods with Arbitrary Lagrangian-Eulerian approach for swimmers with contact" (Landeghem et al., 2024)	multiple swimmers with contact	ALE finite-element Navier–Stokes coupling

A related branch extends the same general agenda from parameter fitting to automated calibration. "Swim2Real: VLM-Guided System Identification for Sim-to-Real Transfer" calibrates a 16-parameter robotic fish simulator from swimming videos using VLM feedback and a backtracking line search (Qiu et al., 21 Mar 2026). Another adjacent sim-to-real line studies a tethered multimaterial soft swimmer driven by Peano-HASELs, using a differentiable planar model to match simulated and observed deformation and then optimize shape and control (Gravert et al., 2022). This suggests that Sim2Swim is best understood as a family resemblance among aquatic sim-to-real methods: reduced-order digital twins, differentiable simulators, and zero-shot control environments all serve the same broad objective of making swimming robots trainable, calibratable, and deployable.

2. Reduced-order digital twins for tendon-driven fish

The clearest reduced-order formulation appears in the MuJoCo digital twin for a tendon-driven fish robot. The body is modeled as five rigid tail segments plus head plus caudal fin, with hinged joints whose scalar stiffness is matched via a natural-frequency sweep at $3.5\ \mathrm{Hz}$ . Actuation is provided by one velocity-controlled motor that pulls two antagonistic tendons, modeled as stiff springs with approximately $3\%$ stretch, and buoyancy is enforced via a soft $z$ -axis position constraint. Each rigid segment of the discretized fish is approximated as an ellipsoid, allowing MuJoCo’s custom force plugin to apply per-segment hydrodynamic forces and torques in forward dynamics.

The fluid model is explicitly stateless. For each segment, MuJoCo computes blunt and slender drag, angular drag, viscous drag, Kutta–Joukowski lift, Magnus lift, and added mass. Representative terms are

$f_D = -\rho\,[\,c_{blunt}A_v + c_{slender}(A_{max}-A_v)\,]\|v\|v,$

and

$f_A = -M_A\dot v + (M_A v)\times \omega.$

The five tunable coefficients are $c_{blunt}$ , $c_{slender}$ , $c_{angular}$ , $c_{kutta}$ , and $c_{magnus}$ , and the total force and torque applied to each segment are $3\%$ 0 and $3\%$ 1.

Parameter identification uses two constant-frequency swimming trajectories, at $3\%$ 2 and $3\%$ 3, recorded in a $3\%$ 4 pool with $3\%$ 5 tracked markers in the robot frame. After matching tail stiffness and motor kinematics, the fluid coefficients and an actuation phase offset $3\%$ 6 are optimized by minimizing the average Euclidean marker error,

$3\%$ 7

Bayesian Optimization over bounds $3\%$ 8 is followed by local Nelder–Mead refinement, with no additional regularization. The identified coefficients are approximately $3\%$ 9, and the final marker error is $z$ 0 at $z$ 1 and $z$ 2 at $z$ 3.

The principal significance of this formulation is its extrapolative performance despite its simplicity. On eight additional forward-swimming trials at different frequencies, without re-tuning, the model predicts cruising velocity $z$ 4 with an average error of $z$ 5, keeps trajectory RMSE within $z$ 6 over $z$ 7, and matches the real frequency-response trend up to motor limits. When compared against classical elongated body theory (EBT), optimized with $z$ 8 and $z$ 9, the EBT error on the same eight-frequency sweep is $f_D = -\rho\,[\,c_{blunt}A_v + c_{slender}(A_{max}-A_v)\,]\|v\|v,$ 0, reported as $f_D = -\rho\,[\,c_{blunt}A_v + c_{slender}(A_{max}-A_v)\,]\|v\|v,$ 1 worse. The stated reasons are that EBT uses only tip kinematics averaged over time, whereas the stateless segment model integrates local $f_D = -\rho\,[\,c_{blunt}A_v + c_{slender}(A_{max}-A_v)\,]\|v\|v,$ 2 and $f_D = -\rho\,[\,c_{blunt}A_v + c_{slender}(A_{max}-A_v)\,]\|v\|v,$ 3 on all segments and handles underactuated tendon compliance and elastic–hydrodynamic coupling (Michelis et al., 26 Feb 2026).

3. Calibration and system identification

System identification is a central axis of Sim2Swim research, and the literature shows a spectrum from low-dimensional optimization to fully automated video-based calibration. In the tendon-driven fish digital twin, the calibration target is marker alignment across a small number of real trajectories, and the unknowns are only five fluid coefficients plus a phase offset. This low-dimensional setup is sufficient to produce strong generalization in actuation frequency, which is precisely why the work argues that simple, stateless models can function as effective digital twins when carefully matched to physical data (Michelis et al., 26 Feb 2026).

A substantially more ambitious calibration problem appears in Swim2Real. There the simulator $f_D = -\rho\,[\,c_{blunt}A_v + c_{slender}(A_{max}-A_v)\,]\|v\|v,$ 4 has dimension $f_D = -\rho\,[\,c_{blunt}A_v + c_{slender}(A_{max}-A_v)\,]\|v\|v,$ 5, and $f_D = -\rho\,[\,c_{blunt}A_v + c_{slender}(A_{max}-A_v)\,]\|v\|v,$ 6 jointly collects five fluid coefficients, one motor arm length, five hinge stiffnesses, and five hinge damping terms. The optimization target can be a marker-based loss,

$f_D = -\rho\,[\,c_{blunt}A_v + c_{slender}(A_{max}-A_v)\,]\|v\|v,$ 7

or a velocity-based loss $f_D = -\rho\,[\,c_{blunt}A_v + c_{slender}(A_{max}-A_v)\,]\|v\|v,$ 8. Rather than treating the objective as a pure black box, the method presents simulated-versus-real overlays, numerical summaries, parameter bounds, and proposal history to a VLM, which proposes an updated $f_D = -\rho\,[\,c_{blunt}A_v + c_{slender}(A_{max}-A_v)\,]\|v\|v,$ 9. Because the VLM is reported to be better at direction than step size, the proposal is wrapped in a backtracking line search with $f_A = -M_A\dot v + (M_A v)\times \omega.$ 0 and $f_A = -M_A\dot v + (M_A v)\times \omega.$ 1:

$f_A = -M_A\dot v + (M_A v)\times \omega.$ 2

The first improving candidate is accepted after clipping to bounds. Empirically, the line search increases acceptance from $f_A = -M_A\dot v + (M_A v)\times \omega.$ 3 to $f_A = -M_A\dot v + (M_A v)\times \omega.$ 4 by rescuing $f_A = -M_A\dot v + (M_A v)\times \omega.$ 5 of correct directions that would otherwise be too large. Quantitatively, Swim2Real achieves marker error $f_A = -M_A\dot v + (M_A v)\times \omega.$ 6 over five seeds, zero outlier seeds, and mean absolute velocity error $f_A = -M_A\dot v + (M_A v)\times \omega.$ 7 across eight frequencies from $f_A = -M_A\dot v + (M_A v)\times \omega.$ 8 to $f_A = -M_A\dot v + (M_A v)\times \omega.$ 9 (Qiu et al., 21 Mar 2026).

A different identification regime appears in the Peano-HASEL soft swimmer. There the simulator is planar-plus-constant-height, with a $c_{blunt}$ 0th-order carangiform body-profile polynomial of length $c_{blunt}$ 1, maximum width $c_{blunt}$ 2, constant height $c_{blunt}$ 3, and a $c_{blunt}$ 4-vertex triangle discretization. Calibration is performed against bending-angle trajectories extracted from videos at voltages $c_{blunt}$ 5 and frequencies $c_{blunt}$ 6. The optimized parameters are actuator amplitude $c_{blunt}$ 7 and slope $c_{blunt}$ 8 in a sloped-box activation waveform, fitted with Adam using $c_{blunt}$ 9, $c_{slender}$ 0, $c_{slender}$ 1, and $c_{slender}$ 2, jointly over the nine $c_{slender}$ 3 settings. On held-out data at $c_{slender}$ 4 and $c_{slender}$ 5, the mean absolute angle error is approximately $c_{slender}$ 6 (Gravert et al., 2022).

These calibration schemes occupy different points on the same design space. This suggests a progression from low-dimensional hand-structured identification, through differentiable parameter fitting, to higher-dimensional automated calibration from video.

4. Reinforcement learning and control transfer

Sim2Swim is not only about matching trajectories; it is also about making the resulting models useful for control. In the tendon-driven fish environment, the MuJoCo simulator is integrated with Gym and Stable-Baselines3. The observation vector is $c_{slender}$ 7-dimensional and includes motor joint $c_{slender}$ 8, $c_{slender}$ 9, segment angles and velocities, head angular rates, target vector, and previous action. The action is a scalar motor acceleration, scaled by $c_{angular}$ 0. The reward is

$c_{angular}$ 1

Soft Actor-Critic is trained for $c_{angular}$ 2 steps with a random target distribution in a $c_{angular}$ 3 front area. Reported results are a $c_{angular}$ 4 success rate over $c_{angular}$ 5 randomized trials and circle-waypoint tracking at $c_{angular}$ 6 radius with average error $c_{angular}$ 7 (Michelis et al., 26 Feb 2026).

In the AUV setting, Sim2Swim shifts from deformable-body modeling to direct robust control under uncertain underwater dynamics. The vehicle state is $c_{angular}$ 8 and $c_{angular}$ 9, and the motion model is the Fossen form

$c_{kutta}$ 0

with thruster mapping $c_{kutta}$ 1, where $c_{kutta}$ 2. The observation vector is $c_{kutta}$ 3, containing quaternion error $c_{kutta}$ 4, linear velocity error $c_{kutta}$ 5, angular velocity $c_{kutta}$ 6, and integral states $c_{kutta}$ 7. The policy $c_{kutta}$ 8 is a two-layer MLP $c_{kutta}$ 9, with ReLU hidden layers and tanh output. Training uses PPO via RSL-RL in NVIDIA Isaac Lab, $c_{magnus}$ 0 parallel environments, $c_{magnus}$ 1 episodes, learning rate $c_{magnus}$ 2, $c_{magnus}$ 3, and $c_{magnus}$ 4 PPO epochs per batch. Convergence is reported in approximately $c_{magnus}$ 5, with full training in less than $c_{magnus}$ 6 on an i7-12800HX plus NVIDIA A2000 laptop. Pool trials on a BlueROV2 Heavy show RMS linear-velocity error below $c_{magnus}$ 7 and attitude error below $c_{magnus}$ 8 in straight-line tracking, convergence under a $c_{magnus}$ 9 ballast perturbation, and position error $3\%$ 00 with attitude error $3\%$ 01 under random waypoint orientations (Fosso et al., 9 Dec 2025).

A related reinforcement-learning precursor is the study of synchronised swimming of two fish. There the follower uses deep Q-learning, with state based on relative displacement and orientation to the leader, a phase variable, and the two previous actions; the action is a discrete modulation of body curvature every half tail-beat. The reward is centered on maintaining wake alignment. The learned follower achieves a $3\%$ 02 reduction in cost of transport over $3\%$ 03 relative to a solitary swimmer and a $3\%$ 04 increase in mean efficiency, arising from a $3\%$ 05 reduction in deformation power rather than a significant change in thrust power (Novati et al., 2016). Although this work is not framed as Sim2Swim, it clarifies why hydrodynamically meaningful simulators are attractive as training environments: they can expose control policies to energy-exploiting flow phenomena that are difficult to script analytically.

5. Differentiable and finite-element simulation regimes

At the high-fidelity end of the spectrum, one Sim2Swim line replaces simplified fluid laws with differentiable surrogates for fluid–structure interaction. In the hybrid differentiable pipeline of Nava et al., the fluid obeys the incompressible Navier–Stokes equations and the deformable body is advanced with Differentiable Projective Dynamics (DiffPD). The simulator alternates two differentiable layers at each time step: a DiffPD solid layer that updates positions and velocities, and a HydroNet fluid layer, implemented as a U-Net, that predicts the next curl and pressure fields from the current flow state, soft boundary mask, and boundary velocities. Coupling is achieved through differentiable rasterization from solid to fluid and an immersed boundary method from fluid pressure to solid surface forces. The fluid surrogate is trained without ground-truth flow data, using a physics-constrained loss

$3\%$ 06

where $3\%$ 07 is a discretized Navier–Stokes residual on the domain and $3\%$ 08 enforces Dirichlet boundary conditions. On a $3\%$ 09 MAC grid with $3\%$ 10 and $3\%$ 11, a $3\%$ 12-step episode requires $3\%$ 13 for the forward pass and $3\%$ 14 for the backward pass, compared with $3\%$ 15 for COMSOL Multiphysics FSI. The paper reports an approximately $3\%$ 16 speedup for forward simulation, validation against COMSOL with the same monotonic frequency response and a shared optimum at $3\%$ 17, and gradient-based optimization of tail-beat frequency converging in about $3\%$ 18 Adam steps to $3\%$ 19 (Nava et al., 2022).

Another line pursues generality rather than speed by formulating swimmer dynamics in an Arbitrary Lagrangian–Eulerian finite-element framework. The fluid region $3\%$ 20 satisfies incompressible Navier–Stokes in ALE form,

$3\%$ 21

with no-slip swimmer boundary velocity

$3\%$ 22

and Newton–Euler rigid-body coupling back from fluid stresses. The discretization uses Taylor–Hood $3\%$ 23– $3\%$ 24 elements, first-order backward Euler or second-order BDF2, an elliptic mesh-motion algorithm for ALE updates, and a short-range repulsive-force model for swimmer–swimmer and swimmer–wall contact. Implemented in Feel++ with MPI parallelization, the framework is benchmarked on three-dimensional flagellated sperm, a three-sphere swimmer, swimmer–wall interactions, squirmer–squirmer collision, and particulate transport in zebrafish arteries. Reported properties include less than $3\%$ 25 error against Razavi and Ahmadi for a flagellated swimmer benchmark, approximately $3\%$ 26 hours on $3\%$ 27 cores for $3\%$ 28 of physical time in the zebrafish-artery case, and $3\%$ 29– $3\%$ 30 parallel efficiency up to $3\%$ 31 CPU cores (Landeghem et al., 2024).

Between these extremes lies the planar Peano-HASEL swimmer model. It uses a semi-discrete mass–spring representation equivalent to linear elastic FEM, implicit backward Euler with $3\%$ 32, and differentiable time stepping in DiffPD. Because shape parameters and control variables are differentiable, the model can optimize net forward speed and map the result directly back to manufacturing and actuation schedules. In canola oil, forward speed peaks sharply at $3\%$ 33 for all tested voltages, with a maximum of approximately $3\%$ 34 at $3\%$ 35 (Gravert et al., 2022).

6. Limitations, misconceptions, and future directions

A common misconception is that aquatic sim-to-real requires either full CFD or extensive real-world tuning. The literature does not support either claim in absolute form. The tendon-driven fish work explicitly argues that a simple, stateless $3\%$ 36-parameter fluid model, identified from only two real trajectories, can yield an accurate digital twin that generalizes across unseen frequencies and supports downstream reinforcement learning (Michelis et al., 26 Feb 2026). The AUV controller likewise states that zero-shot transfer is achieved by design through massive domain randomization, with no fine-tuning or real data needed, and that new platforms only require updating thrust gains $3\%$ 37 and randomization ranges (Fosso et al., 9 Dec 2025).

At the same time, the same literature documents clear ceilings for simplified models. Swim2Real reports an approximately $3\%$ 38 error floor arising from approximating a continuously bending tail with five rigid segments, and proposes per-segment fluid coefficients or higher-fidelity fluid models such as learned residual physics as remedies. It also notes that global steering biases caused by hardware friction are invisible to local-frame marker calibration, leading to open-loop drift and a persistent leftward steering bias in hardware target-reaching (Qiu et al., 21 Mar 2026). These are not merely implementation defects; they identify structural blind spots of local kinematic matching.

A second misconception is that simulator fidelity monotonically dominates simulator usefulness. High-fidelity ALE and differentiable FSI frameworks capture fluid inertia, viscosity, contact, and shape variation more explicitly, but they incur substantial computational cost, remeshing overhead, or restrictions on fluid model class. The ALE framework is currently limited to Newtonian fluids and moderate Reynolds numbers, with small time steps dictated by CFL and contact stiffness; future directions include viscoelastic or non-Newtonian biofluids, adaptive narrow-band contact resolution, and coupling to immersed-body or phase-field methods for topological changes (Landeghem et al., 2024). The differentiable Peano-HASEL pipeline similarly recommends more dynamic data, identification of damping and higher-order effects, and possible neural residuals $3\%$ 39 trained on a larger corpus (Gravert et al., 2022).

A plausible implication is that Sim2Swim research is converging on a layered view of underwater simulation. Reduced stateless models are sufficient when the objective is rapid policy training or frequency-response prediction; differentiable simulators are attractive when the objective is gradient-based design or control optimization; and ALE or related high-fidelity methods become necessary when contact, complex geometry, or richer constitutive behavior cannot be neglected. Under that interpretation, the continuing development from five-parameter MuJoCo twins, to zero-shot domain-randomized AUV control, to VLM-guided $3\%$ 40-parameter calibration, and to differentiable or ALE FSI frameworks is not a sequence of replacements but a stratification of tools for different aquatic-robotics regimes.