Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sim-to-Real Transfer for Muscle-Actuated Robots via Generalized Actuator Networks

Published 10 Apr 2026 in cs.RO and cs.LG | (2604.09487v1)

Abstract: Tendon drives paired with soft muscle actuation enable faster and safer robots while potentially accelerating skill acquisition. Still, these systems are rarely used in practice due to inherent nonlinearities, friction, and hysteresis, which complicate modeling and control. So far, these challenges have hindered policy transfer from simulation to real systems. To bridge this gap, we propose a sim-to-real pipeline that learns a neural network model of this complex actuation and leverages established rigid body simulation for the arm dynamics and interactions with the environment. Our method, called Generalized Actuator Network (GeAN), enables actuation model identification across a wide range of robots by learning directly from joint position trajectories rather than requiring torque sensors. Using GeAN on PAMY2, a tendon-driven robot powered by pneumatic artificial muscles, we successfully deploy precise goal-reaching and dynamic ball-in-a-cup policies trained entirely in simulation. To the best of our knowledge, this result constitutes the first successful sim-to-real transfer for a four-degrees-of-freedom muscle-actuated robot arm.

Summary

  • The paper introduces a Generalized Actuator Network that learns actuator dynamics from joint positions, enabling zero-shot sim-to-real transfer without relying on torque sensors.
  • It demonstrates high precision with >90% success in goal-reaching and 75% success in dynamic ball-in-a-cup tasks, outperforming traditional torque-based models.
  • The approach leverages dense joint history and ensemble training to reduce multi-step prediction errors by up to 29%, enhancing robustness in low-data regimes.

Sim-to-Real Transfer for Muscle-Actuated Robots via Generalized Actuator Networks

Overview

The paper "Sim-to-Real Transfer for Muscle-Actuated Robots via Generalized Actuator Networks" (2604.09487) presents a comprehensive approach for sim-to-real transfer in tendon-driven, muscle-actuated robots, overcoming historical modeling and control challenges through learned actuator modeling. The work centers on the introduction of the Generalized Actuator Network (GeAN), a neural network model designed to capture actuation dynamics directly from joint position trajectories, obviating the need for torque sensing and significantly broadening applicability. The methodology is evaluated on PAMY2, a four-degree-of-freedom (4-DoF) pneumatic artificial muscle-actuated arm, demonstrating zero-shot transfer of both precision goal-reaching and highly dynamic ball-in-a-cup tasks.

Transfemoral muscle and tendon-driven actuation architectures offer lighter, more compliant, and safer robotic systems than traditional motor-driven counterparts. This inherent compliance enables faster operation with reduced risk in human environments and potentially greater sample efficiency in skill learning. Nevertheless, these advantages have been eclipsed by severe obstacles in modeling nonlinearity, configuration-dependent friction, hysteresis, and time-varying dynamics, often making classical control inapplicable and model-based sim-to-real transfer unreliable.

While prior work has addressed actuator modeling for quadrupedal locomotion through torque-based actuator networks, these schemas presuppose the existence of high-quality torque sensors and focus primarily on more regular forms of actuation, such as series elastic drives. Prior sim-to-real transfer for muscle-driven systems has been restricted to simple tasks or to hybrid real/sim paradigms that remain interaction-inefficient. Recent efforts to bridge sim-to-real gaps by domain randomization have been ineffective for muscle actuation due to the vast scale and complexity of the resulting uncertainty.

The Generalized Actuator Network (GeAN)

The central contribution is the Generalized Actuator Network, which eschews reliance on torque sensors and is targeted explicitly at muscle- and tendon-actuated platforms. The GeAN framework proceeds in three main stages:

  1. Data Collection: The real robot generates an exploration dataset comprising thousands of open-loop joint position trajectories, each paired with corresponding control signals. The approach harnesses only joint angles, circumventing the need for direct torque measurements.
  2. Actuator Network Training: The GeAN is trained to map histories of joint positions and control commands to the torques that, when integrated in a physical simulator, will realize the observed motion. The optimization employs a position loss, differentiating through the simulator to minimize rollout errors with respect to ground-truth joint positions, and a torque loss variant that utilizes inverse dynamics only for label generation. Notably, the position loss yields markedly improved multi-step prediction fidelity due to alignment with deployment objectives and accounts for the nontrivial coupling introduced by the robot's mass matrix.
  3. Policy Training and Transfer: The trained ensemble of GeAN models augments a conventional rigid-body robot/object simulator. Deep RL policies are trained entirely in this simulated environment, using GeAN-predicted torques as a drop-in replacement for true physical actuation. For robustness, GeAN ensembles sample individual models per step, reducing policy overfitting to model error in regions of epistemic uncertainty, especially in limited-data regimes. The final policies are deployed on the physical robot with no further adaptation (zero-shot transfer).

Empirical Results

Actuator Modeling

GeAN achieves superior modeling accuracy compared to alternatives such as the Unsupervised Actuator Net (UAN), both in single-step and multi-step predictions, with 6% lower single-step and 29% lower multi-step errors, respectively. The position loss-trained GeAN delivers consistently lower rollout errors, confirming the necessity of loss formulation alignment with deployment metrics.

Task Transfer

Reacher Task (Precision Control):

Zero-shot transfer produces >90%>90\% success rates (within 22^\circ of the goal across all joints) and maintains mean terminal errors below 1.321.32^\circ, indicating that direct sim-based policy training via GeAN produces precision comparable to (or exceeding) hand-tuned controllers despite the presence of severe actuation nonlinearities.

Ball-in-a-Cup Task (Dynamic Manipulation):

Zero-shot policies achieve 75%75\% success on a highly dynamic catching task, with typical failures stemming from discrepancies in object-string mechanics rather than actuation mismatch. Training with noisy and incomplete ball observations in simulation enhances robustness to real-world perception noise. Notably, the policies generalize to unmodeled dynamic effects (e.g., variable end-effector weights and environmental interactions) even though GeAN training data contain only unladen arm trajectories.

Ablations and Data Regime Analysis

The utility of model ensembles for regularization is primarily manifest in low-data regimes (<1000<1000 trajectories), where ensemble-based rollouts mitigate overfitting and performance degradation due to model miscalibration. For large datasets, single-network and ensemble policies perform comparably, establishing the dataset size for operational sufficiency. Moreover, aggressive or jittery control policies (low action penalty in the reward) yield poorer transfer due to the out-of-distribution excitation of dynamics.

Architectural and Training Considerations

Key findings regarding GeAN design and training include:

  • Short, dense input histories (of length 3 with stride 1) outperform sparser variants, capitalizing on the temporal richness in observed signals and effectively modeling actuator hysteresis.
  • Delta encoding and input normalization of joint and control histories prevent overfitting to correlation and enhance discrimination of relevant state changes.
  • Multi-step position losses offer marginal accuracy improvements versus single-step losses, but with prohibitive computational costs, making the latter preferable in practice.

Implications and Future Directions

The results establish learned actuator models as a practical and effective means for bridging the sim-to-real gap in muscle-actuated, tendon-driven robots for both precision and dynamic manipulation tasks. The success in generalizing to new payloads and external disturbances supports the sufficiency of sufficiently rich position data combined with powerful neural architectures for actuation modeling.

The methodology makes high-dimensional, compliant actuation accessible to the RL community without reliance on direct torque measurement, paving the way for more versatile and application-relevant robot learning research. Practically, it facilitates policy development for safety-critical and maintenance-expensive hardware by offloading nearly all experimentation to simulation. Theoretically, it underscores the importance of modular learning architectures that combine domain knowledge—here, rigid-body simulation—with flexible, data-driven modeling of hard-to-parameterize subsystems.

The paper identifies future research axes:

  • Extending GeAN to multitask and athletic skills (e.g., racket sports, object catching/throwing).
  • Online or continual adaptation of actuator models and policies to accommodate time-varying dynamics (e.g., component wear, tendon stretching), potentially through meta-learning or hybrid sim/real retraining.
  • Enhancing the physical fidelity of simulated object-string interactions for more robust transfer on complex manipulation tasks.

Conclusion

The Generalized Actuator Network enables zero-shot transfer of deep RL policies to high-DoF, muscle- and tendon-actuated robots, obviating the need for expensive, fragile, or impractical sensors. In doing so, it levels the playing field for advanced physical system research, removing key barriers to the routine deployment and development of bio-inspired compliant platforms in both laboratory and real-world settings.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Explain it Like I'm 14

What is this paper about?

This paper shows a new way to train robot arms that use “muscles” and “tendons” (like in animals) so that skills learned in a computer simulation work right away on the real robot. The authors build a learned “translator” that converts the robot’s control signals into the twisting forces (called torque) those muscles would actually produce. With this translator plugged into a physics simulator, they train the robot entirely on a computer and then move the behavior to the real machine without extra tuning.

What questions were the researchers trying to answer?

In simple terms, they asked:

  • How can we make robot arms with soft, muscle-like actuators learn skills in simulation and then do them for real without breaking or retuning?
  • Can we learn a good model of messy, hard-to-predict muscle-and-tendon behavior using only easy-to-measure data (joint positions) instead of special torque sensors?
  • Will this be accurate enough for both precise tasks (like reaching a target) and fast, dynamic tasks (like ball-in-a-cup)?

How did they do it?

Think of a robot arm like a person’s arm: the bones and joints are fairly well understood (physics can simulate them), but the muscles and tendons are complicated. Instead of trying to write perfect equations for the muscles and tendons (which is very hard), the authors taught a neural network to imitate those parts.

Here’s the basic idea:

  • Actuators = the robot’s “muscles” that create motion.
  • Torque = the twisting force at a joint (like how hard you turn a doorknob).
  • Hysteresis = when a system doesn’t instantly bounce back the same way (like a rubber band that behaves a bit differently when stretching vs. relaxing).

They built a three-step pipeline:

  1. Collect real data: They ran the real robot with many different movement commands for about 1.4 hours and recorded only what’s easy to get: joint positions over time (how bent or straight each joint is) and the commands sent to the muscles. They did not need torque sensors.
  2. Learn the “Generalized Actuator Network” (GeAN): This neural network learns to turn recent joint positions and commands into the torques the muscles would cause. To train it, they ask the network to produce torques that, when fed into a standard physics simulator, make the next joint position match what the real robot actually did. In other words, they judge the network by how closely the simulated motion follows the real motion (position accuracy), not just by guessing the torque value.
  3. Train skills in simulation: With the learned GeAN handling muscle/tendon behavior and a physics engine handling the arm’s rigid-body motion and objects, they trained reinforcement learning (RL) policies entirely in simulation. RL here means the computer tries lots of trials, gets rewards for doing better, and improves over time—like practicing a sport in a very fast video game. To avoid overfitting to any one guess of the muscle model, they used a small “team” (ensemble) of five GeANs and randomly picked among them during training—like practicing with slightly different equipment each time.

Why this matters technically:

  • Sim-to-real means learning in simulation and then using the skill on the real robot. It saves time, energy, and wear-and-tear because you don’t need to practice everything on the physical robot.
  • Muscle-and-tendon robots are safer and lighter but much harder to model than motor-driven robots. GeAN focuses on learning the tricky part (the muscles) while relying on solid physics for the rest.

What did they find?

They tested their method on PAMY2, a four-joint robot arm powered by pneumatic artificial muscles (PAMs—air-powered “muscles”). They trained two tasks fully in simulation and then ran them on the real robot without extra tuning:

  • Reacher: Move the joints to a target angle accurately.
  • Ball-in-a-cup: Swing a ball on a string into a cup at the robot’s end—fast and precise timing required.

Main results:

  • The learned muscle model was most accurate when trained to match next-step positions (not just torques). This “position-focused” training beat another recent method that tried to learn torques using reinforcement learning.
  • Zero-shot transfer (no extra adjustments) worked well:
    • Reacher: About 90% success, with final errors around 1–1.3 degrees on average.
    • Ball-in-a-cup: About 75% success.
  • Using an ensemble of GeANs helped guard against the policy overfitting to one model’s quirks, though even a single model performed similarly in their tests.
  • Making the policy change commands too aggressively (lowering the action penalty) hurt real-world success—smoother commands transferred better.

To their knowledge, this is the first time a four-joint, muscle-actuated, tendon-driven robot arm successfully learned in simulation and worked in the real world on tasks this complex.

Why is this important?

  • Muscle-like robots can be lighter, faster, and safer, which is great for working around people. But their behavior is complex and hard to model.
  • This work shows a practical way to bridge the gap: learn the messy muscle part from data (without special torque sensors) and use standard physics for everything else.
  • Training in simulation saves time and reduces wear on expensive hardware, while still achieving high performance in the real world.
  • The approach can apply to many different robots, not just this one, because it only needs joint positions and commands—measurements most robots already have.

Bottom line

The authors created a learned “muscle translator” (GeAN) that, when combined with a normal physics simulator, lets a muscle-driven robot practice entirely in a fast, safe virtual world—and then perform well in real life. This brings us closer to building agile, safe robots that can quickly learn tricky skills without long, risky training on real hardware.

Knowledge Gaps

Knowledge gaps, limitations, and open questions

The following list captures concrete gaps and uncertainties left unresolved, highlighting directions for future research and engineering improvements.

  • Generality across platforms: The approach is validated only on PAMY2 (4-DoF, antagonistic PAMs). It remains unclear how GeAN scales to higher-DoF arms, different tendon routings, other soft actuation (hydraulics, McKibben variants), or series elastic/motor-driven systems without torque sensors.
  • Sensitivity to rigid-body model errors: The method assumes a torque-driven simulator with accurate mass, inertia, and kinematics. There is no analysis of how inaccuracies in the rigid-body parameters affect GeAN training or policy transfer, nor procedures to jointly identify or adapt these parameters.
  • External forces during identification: GeAN training assumes no external forces beyond gravity when forming inverse-dynamics labels. Its robustness to contacts, payloads, tool attachments, or dynamic interaction forces during deployment is not quantified or incorporated into training.
  • Actuator and valve dynamics omission: The learned mapping does not explicitly model pneumatic valve dynamics, pressure propagation delays, compressibility, or pressure sensor feedback. The impact of time delays and actuator saturation on closed-loop performance is unmodeled and unquantified.
  • Limited observability in training: Training uses joint positions (and finite-difference velocities/accelerations) and commanded pressures but not measured pressures or tendon tensions. Whether adding pressure/tension sensing improves accuracy, stability, and generalization remains unexplored.
  • Finite-difference noise and latency: Velocities and accelerations are derived via backward/central differences at 2 ms steps without reported filtering, delay compensation, or sensor fusion. The effect of measurement noise and timing uncertainty on inverse-dynamics labels and GeAN accuracy is not analyzed.
  • Position-loss derivation scope: The relation between torque error and position error (qt+1q^t+1Δt2M1(qt)(τtτ^t)q_{t+1}-\hat{q}_{t+1} \approx \Delta t^2 M^{-1}(q_t)(\tau_t-\hat{\tau}_t)) omits Coriolis, centrifugal, damping, and contact terms. Its validity under full nonlinear dynamics and contacts, and its implications for long-horizon differentiation, are not examined.
  • Long-horizon training: Multi-step position losses were tried but abandoned due to inconsistent gains. There is no systematic study of truncated backprop through time, collocation, adjoint methods, or curriculum strategies to reduce compounding error over longer horizons.
  • Ensemble utility and uncertainty handling: The ensemble disagreement penalty is included, but “no ensemble” performs similarly on tested tasks. Conditions under which ensembles help (e.g., smaller datasets, larger model uncertainty, more complex tasks) and alternative uncertainty-aware training (e.g., risk-sensitive RL, Bayesian nets) are not explored.
  • Data efficiency and coverage: The dataset (~1.4 h, 2500 trajectories) is collected with random spline commands. There is no analysis of coverage of state-action space, active data collection for hysteresis/friction modes, or how performance scales with less data.
  • Time-varying properties: Muscle and tendon properties change with temperature, wear, and humidity. The pipeline does not incorporate time-varying dynamics, online adaptation, or scheduled re-identification; robustness to drift over hours/days is unknown.
  • Antagonistic control simplification: Each DoF is controlled by a single antagonistic signal. The impact of this simplification on achievable performance and the feasibility of modeling independent multi-muscle control per joint remains untested.
  • Contact-rich manipulation: Beyond ball-in-a-cup and reacher, the approach is not evaluated on tasks involving sustained contacts, force regulation, or frictional manipulation (e.g., grasping, sliding), where accurate contact and actuator modeling are critical.
  • Payload variability: Policy and GeAN are not studied under varying payloads or end-effector tools. Sensitivity to added mass/inertia and methods to adapt models or policies when payload changes are open questions.
  • Safety guarantees: While soft actuation is safer, the paper provides no formal safety guarantees or constraint handling during training/deployment (e.g., pressure, torque, joint limit, or collision constraints) and no systematic failure-mode analysis.
  • Real-time constraints: The agent runs at 100 Hz while the simulator/GeAN operate at 500 Hz. The effects of control-rate mismatches, computation latency, and communication delays on stability and performance are not analyzed.
  • Simulator dependence: Results rely on MJX. Portability to other physics engines, differentiable simulators, or custom pipelines, and sensitivity to integrator choices and contact parameters, remain untested.
  • Domain randomization synergy: The study positions GeAN as an alternative to heavy domain randomization but does not explore hybrid strategies (e.g., light randomization over rigid-body/contact parameters plus learned actuator models) or quantify trade-offs.
  • Model identifiability: Multiple torque sequences can produce similar position trajectories. The identifiability of the learned torque mapping (physical plausibility, uniqueness) and mechanisms to regularize toward physically consistent solutions are not addressed.
  • Reward shaping and policy robustness: Policies rely on specific action and velocity penalties; ablations show sensitivity. Automated reward tuning, robustness to reward misspecification, and policy performance under different objective preferences are not explored.
  • Sensing for external objects: Ball tracking depends on Vicon with occlusion and misidentification issues. The pipeline does not consider onboard sensing (vision/tactile), sensor fusion, or POMDP training to handle intermittent/noisy observations more systematically.
  • Failure analysis: Despite high success rates, the paper lacks a taxonomy of failure modes (e.g., missed catches, overshoot, oscillation, drift) and targeted remedies (model improvements, controller constraints, observation augmentation).
  • Code/data release and reproducibility: The paper does not state whether code, datasets, simulator models, and trained GeANs/policies are released, limiting independent validation and broader adoption.
  • Extension to closed-loop identification: Zero-shot transfer is demonstrated, but the potential of a small number of real-world rollouts to refine GeAN and policies (e.g., residual learning, online system ID) is not investigated.
  • Theoretical guarantees: There are no formal analyses of stability, convergence of GeAN training, or bounds on sim-to-real performance degradation under model uncertainty. Formalization could guide safe deployment and scaling.
  • Temperature and pressure instrumentation: The approach does not include temperature/pressure measurements as inputs. Investigating whether these signals improve modeling of hysteresis, drift, and pneumatic dynamics is an actionable path.
  • Joint friction and routing effects: Tendon routing–dependent friction is cited as a challenge, but the model inputs do not explicitly encode routing state or friction proxies. Exploring features (e.g., routing angles, tendon curvature) or learned latent friction states could improve accuracy.
  • Longer-horizon tasks: Episodes are limited to 2 s. Evaluating stability, repeatability, and drift over longer durations (e.g., sustained tracking, cyclic motions) remains an open benchmark.

Practical Applications

Immediate Applications

  • Muscle-/tendon-driven robot commissioning and calibration (Robotics, Manufacturing)
    • What: Identify actuator dynamics from joint-position-only logs using GeAN, eliminating torque sensors; create a reliable digital twin for compliant arms.
    • How/Workflow: Collect ~1–2 hours of open-loop spline-excitation data; train GeAN with the position loss; validate multi-step rollouts; couple GeAN to a torque-based simulator (e.g., MuJoCo/MJX); deploy controllers or RL policies.
    • Tools/Products: GeAN model + ensemble, MJX/MuJoCo, PPO (e.g., skrl), ROS2 nodes for data logging and deployment.
    • Assumptions/Dependencies: Accurate rigid-body model (masses, inertias, joint limits); clean joint position sensing; no significant external forces during data collection; GPU for training; reliable pneumatic hardware and pressure control.
  • Rapid sim-to-real training for compliant manipulators (Robotics, Logistics, Entertainment)
    • What: Train policies entirely in simulation and transfer zero-shot to PAM/tendon-driven systems for tasks like reaching, catching, or dynamic manipulation.
    • How/Workflow: Train RL in MJX with GeAN in-loop; add noise/domain randomization and simple observation dropouts; deploy to robot at ~100 Hz control.
    • Tools/Products: RL training pipeline with GeAN, ensemble-based uncertainty injection, task templates (reacher, dynamic catch).
    • Assumptions/Dependencies: Adequate sim fidelity for environment objects; basic perception (e.g., motion capture or onboard vision); safety interlocks.
  • Cost- and sensor-reduced prototyping in labs and startups (Academia, Hardware)
    • What: Build and control soft/tendon actuated prototypes without torque sensors; reduce bill of materials and integration complexity.
    • How/Workflow: Use joint encoders alone; adopt GeAN training script; validate with held-out trajectories.
    • Tools/Products: Open-source training scripts; low-cost tendon/PAM kits; standardized data schemas.
    • Assumptions/Dependencies: Sufficient encoder precision; basic inverse dynamics model available.
  • Maintenance and recalibration for drift and wear (Operations, Field Service)
    • What: Periodically refresh actuator models to account for hysteresis changes, friction, and temperature-induced drift.
    • How/Workflow: Schedule short data-collection routines; fine-tune GeAN; use ensemble disagreement to flag abnormal changes.
    • Tools/Products: Calibration assistant; MLOps pipeline for retraining and validation; monitoring dashboards.
    • Assumptions/Dependencies: Access to robot off-shift; stable pneumatic system; versioned models and rollback mechanisms.
  • Educational modules in soft robotics and RL (Education)
    • What: Lab courses teaching hysteresis/friction modeling and sim-to-real with position-only sensing.
    • How/Workflow: Provide canned datasets and code; students train GeAN and deploy simple policies on benchtop tendon devices.
    • Tools/Products: Courseware, datasets, notebooks, lightweight simulators.
    • Assumptions/Dependencies: Modest GPU access; safe tabletop hardware.
  • Digital twin creation for muscle-actuated systems (Software/Simulation)
    • What: Combine rigid-body physics with learned actuator layer to create credible digital twins for scenario testing, planning, and what-if analyses.
    • How/Workflow: Calibrate GeAN; integrate with object-rich simulations; run parallel policy evaluations on GPU.
    • Tools/Products: GeAN plug-ins for simulators; scenario libraries; batch evaluation services.
    • Assumptions/Dependencies: Accurate CAD/inertial parameters; support for torque-driven simulation.
  • Improved tracking and control with learned actuation models (Robotics, Software)
    • What: Insert GeAN in existing controllers (e.g., MPC, inverse dynamics) to better map commands to joint torques under tendon friction and PAM hysteresis.
    • How/Workflow: Use GeAN as a torque predictor inside control law; tune action smoothing penalties.
    • Tools/Products: Controller adapters; real-time inference on embedded CPU/GPU.
    • Assumptions/Dependencies: Latency bounds; validated one-step differentiable behavior; safety constraints.
  • Retrofitting tendon/cable-driven mechanisms lacking torque sensors (Robotics, Industrial Equipment)
    • What: Upgrade existing devices (e.g., gimbals, grippers, robot hands) with learned actuation maps for improved precision and responsiveness.
    • How/Workflow: Quick data capture in-service; train compact GeAN; deploy as a firmware update.
    • Tools/Products: Edge inference libraries; calibration app.
    • Assumptions/Dependencies: Access to position sensors; edge compute; minimal downtime.
  • Safer human-robot interaction via compliant hardware with precise control (Robotics, Policy)
    • What: Leverage soft actuation’s safety with precise, learned control for collaborative tasks at higher speeds.
    • How/Workflow: Validate twin against collision scenarios; enforce speed/force limits; use ensemble disagreement as a runtime safety signal.
    • Tools/Products: HRI safety monitor; compliance test suites; risk assessment templates.
    • Assumptions/Dependencies: Compliance with standards (ISO/TS 15066); well-calibrated uncertainty thresholds.

Long-Term Applications

  • Commercial co-bots powered by artificial muscles with certified sim-to-real training (Robotics, Manufacturing)
    • What: Productized muscle-actuated arms that learn most behaviors in simulation and deploy on the factory floor with minimal tuning.
    • Tools/Products: Industrial-grade GeAN stack, certification packs, auto-calibration stations.
    • Assumptions/Dependencies: Robust pneumatics/valving; long-life tendons; certification pathways for learned models; quiet/efficient compressors.
  • Personalized assistive exoskeletons and prostheses (Healthcare)
    • What: Rapid, patient-specific actuator model fitting from brief movement logs; safer, more responsive assistance despite hysteresis.
    • Tools/Products: Clinic-grade calibration workflows; onboard GeAN inference; telemetry for ongoing adaptation.
    • Assumptions/Dependencies: Human-in-the-loop safety; wearable form factors; privacy-preserving data handling; low-power actuators.
  • Generalization to other complex actuators (Hydraulics, electroactive polymers, shape-memory alloys) (Robotics, Energy)
    • What: Use joint-position-driven GeANs to model nonlinear, hysteretic actuators beyond PAMs and tendons.
    • Tools/Products: Cross-actuator model zoo; transfer-learning recipes; plug-ins for multiple simulators.
    • Assumptions/Dependencies: Quality position sensing; appropriate exploration signals; expanded simulators that include fluid/thermal couplings where needed.
  • Uncertainty-aware control and safety governance (Software, Policy)
    • What: Integrate ensemble disagreement into safety supervisors to modulate speed/force or trigger safe stops; inform certification with calibrated uncertainty.
    • Tools/Products: Runtime monitors; failover controllers; audit logs capturing uncertainty metrics.
    • Assumptions/Dependencies: Calibrated uncertainty-to-risk mapping; standards acceptance for uncertainty-based safeguards.
  • Standardized datasets, benchmarks, and testbeds for muscle-actuated sim-to-real (Academia, Policy)
    • What: Shared corpora and evaluation suites to compare algorithms and support reproducibility.
    • Tools/Products: Public datasets, leaderboards, reference hardware layouts.
    • Assumptions/Dependencies: Community coordination; IP and data-sharing agreements.
  • Onboard continual learning and real-time adaptation (Robotics, Edge AI)
    • What: Online fine-tuning of GeAN residuals to counteract wear, temperature, and load changes without downtime.
    • Tools/Products: Safe on-device learning loops; drift detectors; guardrailed optimizers.
    • Assumptions/Dependencies: Edge accelerators; proven safe-learning protocols; robust rollback.
  • Simulation-driven certification and insurance underwriting (Policy, Insurance)
    • What: Formal workflows where validated digital twins with learned actuators support regulatory approval and risk assessment.
    • Tools/Products: Verification toolchains; scenario coverage metrics; traceable training artefacts.
    • Assumptions/Dependencies: Accepted standards for learned-model verification; third-party auditing infrastructure.
  • Dynamic manipulation in homes and warehouses (Robotics, Daily Life, Logistics)
    • What: Fast, compliant robots that safely catch, toss, place, and interact with deformable or delicate objects.
    • Tools/Products: Perception stacks replacing motion capture (e.g., vision + filtering); task libraries; safety envelopes.
    • Assumptions/Dependencies: Robust onboard perception (occlusion handling, low light); real-time compute; quiet, compact actuation.
  • Hardware–software co-design for energy/performance optimization (Energy, Hardware)
    • What: Use the sim pipeline to optimize tendon routing, valve sizing, compliance, and materials for speed, energy, and safety trade-offs.
    • Tools/Products: Automated design exploration; multiobjective optimizers integrated with GeAN-in-the-loop simulation.
    • Assumptions/Dependencies: Expanded plant models (compressibility, valve dynamics, thermals); scalable batch simulation.
  • Consumer-grade safe home assistants and sport-training devices (Daily Life)
    • What: Soft, fast, and safe robotic devices for household chores or athletic skill practice that train in simulation and transfer reliably.
    • Tools/Products: Compact actuation modules; pre-trained behaviors; calibration apps for home environments.
    • Assumptions/Dependencies: Cost and noise reduction for pneumatic systems; durability; straightforward user calibration.

Cross-cutting Assumptions and Dependencies

  • Accurate torque-driven rigid-body simulators and inverse dynamics are available and correctly parameterized.
  • Exploration data must cover the operational envelope; training assumes negligible external forces during data capture.
  • Reliable joint position sensing is required; object tracking used in demos (e.g., Vicon) should be replaced by onboard perception for products.
  • Compute resources (GPU) facilitate training; real-time inference must meet control-loop latency budgets.
  • Safety, certification, and standardization of learned actuator models are evolving and will influence deployment timelines.

Glossary

  • Actuator network: A learned model that maps robot states and control inputs to actuator-produced joint torques. "actuator networks, i.e., learned actuator models"
  • Antagonistic muscle pair: Two actuators arranged to pull in opposite directions across a joint, mimicking agonist/antagonist muscles. "Each DoF is actuated by an antagonistic muscle pair."
  • Ball-in-a-cup: A dynamic control task where a ball on a string must be swung into a cup on the end effector. "In the ball-in-a-cup task, the robot has to swing a ball on a string into a cup at its end effector (see \cref{fig:ball_in_a_cup_rollout})."
  • Backward differences: A finite-difference method using past samples to estimate derivatives. "from the positions via backward differences and central differences, respectively."
  • Bootstrapping: A resampling method for estimating uncertainty, such as confidence intervals. "The error bars visualize the 95\% confidence intervals, obtained via bootstrapping."
  • Central differences: A finite-difference method using surrounding samples to estimate derivatives. "from the positions via backward differences and central differences, respectively."
  • Cubic spline: A smooth, piecewise-cubic interpolation used to generate continuous trajectories. "fit a cubic spline between these commands"
  • Deep Lagrangian Network: A neural model that learns dynamics consistent with Lagrangian mechanics. "learn a Deep Lagrangian Network~\cite{lutter2019deep} model of PAM dynamics"
  • Degrees of freedom (DoF): The number of independent joint coordinates defining a robot’s configuration. "with four degrees of freedom (DoFs)."
  • Domain randomization: Randomizing simulation parameters to make policies robust to real-world variation. "A common technique to bridge the sim-to-real gap, i.e., the difference between simulated and real dynamics, is domain randomization"
  • Ensemble disagreement: The variability among multiple models’ predictions used to quantify uncertainty. "The ensemble disagreement constitutes a measure of the model's epistemic uncertainty."
  • Epistemic uncertainty: Uncertainty arising from limited data or model capacity, reducible with more information. "The ensemble disagreement constitutes a measure of the model's epistemic uncertainty."
  • Generalized Actuator Network (GeAN): The paper’s neural actuator model that maps histories of joint positions and control signals to torques. "Our method, called Generalized Actuator Network~(GeAN), enables actuation model identification across a wide range of robots"
  • GPU-based simulator: A physics simulator accelerated by GPUs for massive parallelism and speed. "which we simulate in MuJoCo XLA (MJX)~\cite{todorov2012mujoco}, an efficient GPU-based simulator."
  • Harmonic drives: High-ratio, low-backlash gear transmissions common in robotics with distinct friction and hysteresis. "in a robot arm with harmonic drives"
  • Hysteresis: Path-dependent behavior where output depends on the history of inputs, not just current inputs. "due to inherent nonlinearities, friction, and hysteresis"
  • Inverse dynamics: Computing required joint torques from known positions, velocities, and accelerations. "and an inverse dynamics function $\boldsymbol{\tau}_t = \invdyn(\boldsymbol{q}_t, \boldsymbol{\dot{q}_t, \boldsymbol{\ddot{q}_t)$"
  • Iterative learning control: A method that refines control inputs over repeated trials to improve tracking. "such as iterative learning control~\cite{ma2022learning}"
  • Mass matrix: The configuration-dependent inertia matrix relating joint accelerations to torques. "the mass matrix of the robot in position qt\boldsymbol{q}_t."
  • Mechanical compliance: Passive flexibility that reduces impact forces and improves safety. "the lightweight design and mechanical compliance greatly reduce contact forces upon collision"
  • MuJoCo XLA (MJX): A GPU-accelerated variant of the MuJoCo physics engine used for fast simulation. "which we simulate in MuJoCo XLA (MJX)~\cite{todorov2012mujoco}"
  • Open-loop (control): Executing a predefined command sequence without feedback correction during motion. "an exploration dataset of 2500 open-loop trajectories"
  • Partially observable: A setting where available observations do not fully specify the system’s true state. "making the task partially observable."
  • PD-controller: Proportional-derivative controller that applies control based on position error and velocity. "controlled by a PD-controller with known gains."
  • Pneumatic artificial muscle (PAM): A soft, pressure-driven actuator that contracts like biological muscle. "pneumatic artificial muscles~(PAMs)"
  • Proximal Policy Optimization (PPO): A stable policy-gradient RL algorithm using a clipped objective. "Proximal Policy Optimization~(PPO)~\cite{schulman2017proximal}"
  • Reinforcement learning (RL): Learning control policies via reward-driven interaction with an environment. "reinforcement learning~(RL)"
  • Rigid body dynamics: Dynamics of non-deformable bodies governed by Newton-Euler equations. "The arm and objects follow simple rigid body dynamics"
  • Series elastic actuators: Actuators with an elastic element in series to enhance compliance and force control. "focuses on more well-behaved series elastic actuators"
  • Sim-to-real transfer: Deploying a policy trained in simulation directly on real hardware. "this result constitutes the first successful sim-to-real transfer for a four-degrees-of-freedom muscle-actuated robot arm."
  • System identification: Estimating model structure or parameters from measured input-output data. "utilize a combination of system identification with an analytic dynamics model"
  • Tendon-driven: Actuation via tendons (cables) transmitting forces from remote actuators to joints. "a tendon-driven robot powered by pneumatic artificial muscles"
  • Tendon routing: The specific paths that tendons take through a mechanism, affecting friction and coupling. "due to tendon routing."
  • Torque-based simulator: A simulator that accepts joint torques as inputs to advance dynamics. "a torque-based simulator of the arm dynamics"
  • Torque sensors: Sensors that directly measure joint torques. "rather than requiring torque sensors."
  • Unsupervised Actuator Net (UAN): A baseline method that frames actuator modeling as an RL task. "the Unsupervised Actuator Net~(UAN)~\cite{fey2025bridging}"
  • Vicon (object tracking system): A marker-based motion capture system for tracking object pose. "we use a Vicon object tracking system."
  • Wilson score interval: A binomial proportion confidence interval with better small-sample performance. "computed with the Wilson score interval."
  • Zero-shot transfer: Deploying to the real system without any additional fine-tuning or adaptation. "transferred zero-shot to the physical robot."

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 5 tweets with 138 likes about this paper.