Sim-to-Real Transfer for Muscle-Actuated Robots via Generalized Actuator Networks
Abstract: Tendon drives paired with soft muscle actuation enable faster and safer robots while potentially accelerating skill acquisition. Still, these systems are rarely used in practice due to inherent nonlinearities, friction, and hysteresis, which complicate modeling and control. So far, these challenges have hindered policy transfer from simulation to real systems. To bridge this gap, we propose a sim-to-real pipeline that learns a neural network model of this complex actuation and leverages established rigid body simulation for the arm dynamics and interactions with the environment. Our method, called Generalized Actuator Network (GeAN), enables actuation model identification across a wide range of robots by learning directly from joint position trajectories rather than requiring torque sensors. Using GeAN on PAMY2, a tendon-driven robot powered by pneumatic artificial muscles, we successfully deploy precise goal-reaching and dynamic ball-in-a-cup policies trained entirely in simulation. To the best of our knowledge, this result constitutes the first successful sim-to-real transfer for a four-degrees-of-freedom muscle-actuated robot arm.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What is this paper about?
This paper shows a new way to train robot arms that use “muscles” and “tendons” (like in animals) so that skills learned in a computer simulation work right away on the real robot. The authors build a learned “translator” that converts the robot’s control signals into the twisting forces (called torque) those muscles would actually produce. With this translator plugged into a physics simulator, they train the robot entirely on a computer and then move the behavior to the real machine without extra tuning.
What questions were the researchers trying to answer?
In simple terms, they asked:
- How can we make robot arms with soft, muscle-like actuators learn skills in simulation and then do them for real without breaking or retuning?
- Can we learn a good model of messy, hard-to-predict muscle-and-tendon behavior using only easy-to-measure data (joint positions) instead of special torque sensors?
- Will this be accurate enough for both precise tasks (like reaching a target) and fast, dynamic tasks (like ball-in-a-cup)?
How did they do it?
Think of a robot arm like a person’s arm: the bones and joints are fairly well understood (physics can simulate them), but the muscles and tendons are complicated. Instead of trying to write perfect equations for the muscles and tendons (which is very hard), the authors taught a neural network to imitate those parts.
Here’s the basic idea:
- Actuators = the robot’s “muscles” that create motion.
- Torque = the twisting force at a joint (like how hard you turn a doorknob).
- Hysteresis = when a system doesn’t instantly bounce back the same way (like a rubber band that behaves a bit differently when stretching vs. relaxing).
They built a three-step pipeline:
- Collect real data: They ran the real robot with many different movement commands for about 1.4 hours and recorded only what’s easy to get: joint positions over time (how bent or straight each joint is) and the commands sent to the muscles. They did not need torque sensors.
- Learn the “Generalized Actuator Network” (GeAN): This neural network learns to turn recent joint positions and commands into the torques the muscles would cause. To train it, they ask the network to produce torques that, when fed into a standard physics simulator, make the next joint position match what the real robot actually did. In other words, they judge the network by how closely the simulated motion follows the real motion (position accuracy), not just by guessing the torque value.
- Train skills in simulation: With the learned GeAN handling muscle/tendon behavior and a physics engine handling the arm’s rigid-body motion and objects, they trained reinforcement learning (RL) policies entirely in simulation. RL here means the computer tries lots of trials, gets rewards for doing better, and improves over time—like practicing a sport in a very fast video game. To avoid overfitting to any one guess of the muscle model, they used a small “team” (ensemble) of five GeANs and randomly picked among them during training—like practicing with slightly different equipment each time.
Why this matters technically:
- Sim-to-real means learning in simulation and then using the skill on the real robot. It saves time, energy, and wear-and-tear because you don’t need to practice everything on the physical robot.
- Muscle-and-tendon robots are safer and lighter but much harder to model than motor-driven robots. GeAN focuses on learning the tricky part (the muscles) while relying on solid physics for the rest.
What did they find?
They tested their method on PAMY2, a four-joint robot arm powered by pneumatic artificial muscles (PAMs—air-powered “muscles”). They trained two tasks fully in simulation and then ran them on the real robot without extra tuning:
- Reacher: Move the joints to a target angle accurately.
- Ball-in-a-cup: Swing a ball on a string into a cup at the robot’s end—fast and precise timing required.
Main results:
- The learned muscle model was most accurate when trained to match next-step positions (not just torques). This “position-focused” training beat another recent method that tried to learn torques using reinforcement learning.
- Zero-shot transfer (no extra adjustments) worked well:
- Reacher: About 90% success, with final errors around 1–1.3 degrees on average.
- Ball-in-a-cup: About 75% success.
- Using an ensemble of GeANs helped guard against the policy overfitting to one model’s quirks, though even a single model performed similarly in their tests.
- Making the policy change commands too aggressively (lowering the action penalty) hurt real-world success—smoother commands transferred better.
To their knowledge, this is the first time a four-joint, muscle-actuated, tendon-driven robot arm successfully learned in simulation and worked in the real world on tasks this complex.
Why is this important?
- Muscle-like robots can be lighter, faster, and safer, which is great for working around people. But their behavior is complex and hard to model.
- This work shows a practical way to bridge the gap: learn the messy muscle part from data (without special torque sensors) and use standard physics for everything else.
- Training in simulation saves time and reduces wear on expensive hardware, while still achieving high performance in the real world.
- The approach can apply to many different robots, not just this one, because it only needs joint positions and commands—measurements most robots already have.
Bottom line
The authors created a learned “muscle translator” (GeAN) that, when combined with a normal physics simulator, lets a muscle-driven robot practice entirely in a fast, safe virtual world—and then perform well in real life. This brings us closer to building agile, safe robots that can quickly learn tricky skills without long, risky training on real hardware.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
The following list captures concrete gaps and uncertainties left unresolved, highlighting directions for future research and engineering improvements.
- Generality across platforms: The approach is validated only on PAMY2 (4-DoF, antagonistic PAMs). It remains unclear how GeAN scales to higher-DoF arms, different tendon routings, other soft actuation (hydraulics, McKibben variants), or series elastic/motor-driven systems without torque sensors.
- Sensitivity to rigid-body model errors: The method assumes a torque-driven simulator with accurate mass, inertia, and kinematics. There is no analysis of how inaccuracies in the rigid-body parameters affect GeAN training or policy transfer, nor procedures to jointly identify or adapt these parameters.
- External forces during identification: GeAN training assumes no external forces beyond gravity when forming inverse-dynamics labels. Its robustness to contacts, payloads, tool attachments, or dynamic interaction forces during deployment is not quantified or incorporated into training.
- Actuator and valve dynamics omission: The learned mapping does not explicitly model pneumatic valve dynamics, pressure propagation delays, compressibility, or pressure sensor feedback. The impact of time delays and actuator saturation on closed-loop performance is unmodeled and unquantified.
- Limited observability in training: Training uses joint positions (and finite-difference velocities/accelerations) and commanded pressures but not measured pressures or tendon tensions. Whether adding pressure/tension sensing improves accuracy, stability, and generalization remains unexplored.
- Finite-difference noise and latency: Velocities and accelerations are derived via backward/central differences at 2 ms steps without reported filtering, delay compensation, or sensor fusion. The effect of measurement noise and timing uncertainty on inverse-dynamics labels and GeAN accuracy is not analyzed.
- Position-loss derivation scope: The relation between torque error and position error () omits Coriolis, centrifugal, damping, and contact terms. Its validity under full nonlinear dynamics and contacts, and its implications for long-horizon differentiation, are not examined.
- Long-horizon training: Multi-step position losses were tried but abandoned due to inconsistent gains. There is no systematic study of truncated backprop through time, collocation, adjoint methods, or curriculum strategies to reduce compounding error over longer horizons.
- Ensemble utility and uncertainty handling: The ensemble disagreement penalty is included, but “no ensemble” performs similarly on tested tasks. Conditions under which ensembles help (e.g., smaller datasets, larger model uncertainty, more complex tasks) and alternative uncertainty-aware training (e.g., risk-sensitive RL, Bayesian nets) are not explored.
- Data efficiency and coverage: The dataset (~1.4 h, 2500 trajectories) is collected with random spline commands. There is no analysis of coverage of state-action space, active data collection for hysteresis/friction modes, or how performance scales with less data.
- Time-varying properties: Muscle and tendon properties change with temperature, wear, and humidity. The pipeline does not incorporate time-varying dynamics, online adaptation, or scheduled re-identification; robustness to drift over hours/days is unknown.
- Antagonistic control simplification: Each DoF is controlled by a single antagonistic signal. The impact of this simplification on achievable performance and the feasibility of modeling independent multi-muscle control per joint remains untested.
- Contact-rich manipulation: Beyond ball-in-a-cup and reacher, the approach is not evaluated on tasks involving sustained contacts, force regulation, or frictional manipulation (e.g., grasping, sliding), where accurate contact and actuator modeling are critical.
- Payload variability: Policy and GeAN are not studied under varying payloads or end-effector tools. Sensitivity to added mass/inertia and methods to adapt models or policies when payload changes are open questions.
- Safety guarantees: While soft actuation is safer, the paper provides no formal safety guarantees or constraint handling during training/deployment (e.g., pressure, torque, joint limit, or collision constraints) and no systematic failure-mode analysis.
- Real-time constraints: The agent runs at 100 Hz while the simulator/GeAN operate at 500 Hz. The effects of control-rate mismatches, computation latency, and communication delays on stability and performance are not analyzed.
- Simulator dependence: Results rely on MJX. Portability to other physics engines, differentiable simulators, or custom pipelines, and sensitivity to integrator choices and contact parameters, remain untested.
- Domain randomization synergy: The study positions GeAN as an alternative to heavy domain randomization but does not explore hybrid strategies (e.g., light randomization over rigid-body/contact parameters plus learned actuator models) or quantify trade-offs.
- Model identifiability: Multiple torque sequences can produce similar position trajectories. The identifiability of the learned torque mapping (physical plausibility, uniqueness) and mechanisms to regularize toward physically consistent solutions are not addressed.
- Reward shaping and policy robustness: Policies rely on specific action and velocity penalties; ablations show sensitivity. Automated reward tuning, robustness to reward misspecification, and policy performance under different objective preferences are not explored.
- Sensing for external objects: Ball tracking depends on Vicon with occlusion and misidentification issues. The pipeline does not consider onboard sensing (vision/tactile), sensor fusion, or POMDP training to handle intermittent/noisy observations more systematically.
- Failure analysis: Despite high success rates, the paper lacks a taxonomy of failure modes (e.g., missed catches, overshoot, oscillation, drift) and targeted remedies (model improvements, controller constraints, observation augmentation).
- Code/data release and reproducibility: The paper does not state whether code, datasets, simulator models, and trained GeANs/policies are released, limiting independent validation and broader adoption.
- Extension to closed-loop identification: Zero-shot transfer is demonstrated, but the potential of a small number of real-world rollouts to refine GeAN and policies (e.g., residual learning, online system ID) is not investigated.
- Theoretical guarantees: There are no formal analyses of stability, convergence of GeAN training, or bounds on sim-to-real performance degradation under model uncertainty. Formalization could guide safe deployment and scaling.
- Temperature and pressure instrumentation: The approach does not include temperature/pressure measurements as inputs. Investigating whether these signals improve modeling of hysteresis, drift, and pneumatic dynamics is an actionable path.
- Joint friction and routing effects: Tendon routing–dependent friction is cited as a challenge, but the model inputs do not explicitly encode routing state or friction proxies. Exploring features (e.g., routing angles, tendon curvature) or learned latent friction states could improve accuracy.
- Longer-horizon tasks: Episodes are limited to 2 s. Evaluating stability, repeatability, and drift over longer durations (e.g., sustained tracking, cyclic motions) remains an open benchmark.
Practical Applications
Immediate Applications
- Muscle-/tendon-driven robot commissioning and calibration (Robotics, Manufacturing)
- What: Identify actuator dynamics from joint-position-only logs using GeAN, eliminating torque sensors; create a reliable digital twin for compliant arms.
- How/Workflow: Collect ~1–2 hours of open-loop spline-excitation data; train GeAN with the position loss; validate multi-step rollouts; couple GeAN to a torque-based simulator (e.g., MuJoCo/MJX); deploy controllers or RL policies.
- Tools/Products: GeAN model + ensemble, MJX/MuJoCo, PPO (e.g., skrl), ROS2 nodes for data logging and deployment.
- Assumptions/Dependencies: Accurate rigid-body model (masses, inertias, joint limits); clean joint position sensing; no significant external forces during data collection; GPU for training; reliable pneumatic hardware and pressure control.
- Rapid sim-to-real training for compliant manipulators (Robotics, Logistics, Entertainment)
- What: Train policies entirely in simulation and transfer zero-shot to PAM/tendon-driven systems for tasks like reaching, catching, or dynamic manipulation.
- How/Workflow: Train RL in MJX with GeAN in-loop; add noise/domain randomization and simple observation dropouts; deploy to robot at ~100 Hz control.
- Tools/Products: RL training pipeline with GeAN, ensemble-based uncertainty injection, task templates (reacher, dynamic catch).
- Assumptions/Dependencies: Adequate sim fidelity for environment objects; basic perception (e.g., motion capture or onboard vision); safety interlocks.
- Cost- and sensor-reduced prototyping in labs and startups (Academia, Hardware)
- What: Build and control soft/tendon actuated prototypes without torque sensors; reduce bill of materials and integration complexity.
- How/Workflow: Use joint encoders alone; adopt GeAN training script; validate with held-out trajectories.
- Tools/Products: Open-source training scripts; low-cost tendon/PAM kits; standardized data schemas.
- Assumptions/Dependencies: Sufficient encoder precision; basic inverse dynamics model available.
- Maintenance and recalibration for drift and wear (Operations, Field Service)
- What: Periodically refresh actuator models to account for hysteresis changes, friction, and temperature-induced drift.
- How/Workflow: Schedule short data-collection routines; fine-tune GeAN; use ensemble disagreement to flag abnormal changes.
- Tools/Products: Calibration assistant; MLOps pipeline for retraining and validation; monitoring dashboards.
- Assumptions/Dependencies: Access to robot off-shift; stable pneumatic system; versioned models and rollback mechanisms.
- Educational modules in soft robotics and RL (Education)
- What: Lab courses teaching hysteresis/friction modeling and sim-to-real with position-only sensing.
- How/Workflow: Provide canned datasets and code; students train GeAN and deploy simple policies on benchtop tendon devices.
- Tools/Products: Courseware, datasets, notebooks, lightweight simulators.
- Assumptions/Dependencies: Modest GPU access; safe tabletop hardware.
- Digital twin creation for muscle-actuated systems (Software/Simulation)
- What: Combine rigid-body physics with learned actuator layer to create credible digital twins for scenario testing, planning, and what-if analyses.
- How/Workflow: Calibrate GeAN; integrate with object-rich simulations; run parallel policy evaluations on GPU.
- Tools/Products: GeAN plug-ins for simulators; scenario libraries; batch evaluation services.
- Assumptions/Dependencies: Accurate CAD/inertial parameters; support for torque-driven simulation.
- Improved tracking and control with learned actuation models (Robotics, Software)
- What: Insert GeAN in existing controllers (e.g., MPC, inverse dynamics) to better map commands to joint torques under tendon friction and PAM hysteresis.
- How/Workflow: Use GeAN as a torque predictor inside control law; tune action smoothing penalties.
- Tools/Products: Controller adapters; real-time inference on embedded CPU/GPU.
- Assumptions/Dependencies: Latency bounds; validated one-step differentiable behavior; safety constraints.
- Retrofitting tendon/cable-driven mechanisms lacking torque sensors (Robotics, Industrial Equipment)
- What: Upgrade existing devices (e.g., gimbals, grippers, robot hands) with learned actuation maps for improved precision and responsiveness.
- How/Workflow: Quick data capture in-service; train compact GeAN; deploy as a firmware update.
- Tools/Products: Edge inference libraries; calibration app.
- Assumptions/Dependencies: Access to position sensors; edge compute; minimal downtime.
- Safer human-robot interaction via compliant hardware with precise control (Robotics, Policy)
- What: Leverage soft actuation’s safety with precise, learned control for collaborative tasks at higher speeds.
- How/Workflow: Validate twin against collision scenarios; enforce speed/force limits; use ensemble disagreement as a runtime safety signal.
- Tools/Products: HRI safety monitor; compliance test suites; risk assessment templates.
- Assumptions/Dependencies: Compliance with standards (ISO/TS 15066); well-calibrated uncertainty thresholds.
Long-Term Applications
- Commercial co-bots powered by artificial muscles with certified sim-to-real training (Robotics, Manufacturing)
- What: Productized muscle-actuated arms that learn most behaviors in simulation and deploy on the factory floor with minimal tuning.
- Tools/Products: Industrial-grade GeAN stack, certification packs, auto-calibration stations.
- Assumptions/Dependencies: Robust pneumatics/valving; long-life tendons; certification pathways for learned models; quiet/efficient compressors.
- Personalized assistive exoskeletons and prostheses (Healthcare)
- What: Rapid, patient-specific actuator model fitting from brief movement logs; safer, more responsive assistance despite hysteresis.
- Tools/Products: Clinic-grade calibration workflows; onboard GeAN inference; telemetry for ongoing adaptation.
- Assumptions/Dependencies: Human-in-the-loop safety; wearable form factors; privacy-preserving data handling; low-power actuators.
- Generalization to other complex actuators (Hydraulics, electroactive polymers, shape-memory alloys) (Robotics, Energy)
- What: Use joint-position-driven GeANs to model nonlinear, hysteretic actuators beyond PAMs and tendons.
- Tools/Products: Cross-actuator model zoo; transfer-learning recipes; plug-ins for multiple simulators.
- Assumptions/Dependencies: Quality position sensing; appropriate exploration signals; expanded simulators that include fluid/thermal couplings where needed.
- Uncertainty-aware control and safety governance (Software, Policy)
- What: Integrate ensemble disagreement into safety supervisors to modulate speed/force or trigger safe stops; inform certification with calibrated uncertainty.
- Tools/Products: Runtime monitors; failover controllers; audit logs capturing uncertainty metrics.
- Assumptions/Dependencies: Calibrated uncertainty-to-risk mapping; standards acceptance for uncertainty-based safeguards.
- Standardized datasets, benchmarks, and testbeds for muscle-actuated sim-to-real (Academia, Policy)
- What: Shared corpora and evaluation suites to compare algorithms and support reproducibility.
- Tools/Products: Public datasets, leaderboards, reference hardware layouts.
- Assumptions/Dependencies: Community coordination; IP and data-sharing agreements.
- Onboard continual learning and real-time adaptation (Robotics, Edge AI)
- What: Online fine-tuning of GeAN residuals to counteract wear, temperature, and load changes without downtime.
- Tools/Products: Safe on-device learning loops; drift detectors; guardrailed optimizers.
- Assumptions/Dependencies: Edge accelerators; proven safe-learning protocols; robust rollback.
- Simulation-driven certification and insurance underwriting (Policy, Insurance)
- What: Formal workflows where validated digital twins with learned actuators support regulatory approval and risk assessment.
- Tools/Products: Verification toolchains; scenario coverage metrics; traceable training artefacts.
- Assumptions/Dependencies: Accepted standards for learned-model verification; third-party auditing infrastructure.
- Dynamic manipulation in homes and warehouses (Robotics, Daily Life, Logistics)
- What: Fast, compliant robots that safely catch, toss, place, and interact with deformable or delicate objects.
- Tools/Products: Perception stacks replacing motion capture (e.g., vision + filtering); task libraries; safety envelopes.
- Assumptions/Dependencies: Robust onboard perception (occlusion handling, low light); real-time compute; quiet, compact actuation.
- Hardware–software co-design for energy/performance optimization (Energy, Hardware)
- What: Use the sim pipeline to optimize tendon routing, valve sizing, compliance, and materials for speed, energy, and safety trade-offs.
- Tools/Products: Automated design exploration; multiobjective optimizers integrated with GeAN-in-the-loop simulation.
- Assumptions/Dependencies: Expanded plant models (compressibility, valve dynamics, thermals); scalable batch simulation.
- Consumer-grade safe home assistants and sport-training devices (Daily Life)
- What: Soft, fast, and safe robotic devices for household chores or athletic skill practice that train in simulation and transfer reliably.
- Tools/Products: Compact actuation modules; pre-trained behaviors; calibration apps for home environments.
- Assumptions/Dependencies: Cost and noise reduction for pneumatic systems; durability; straightforward user calibration.
Cross-cutting Assumptions and Dependencies
- Accurate torque-driven rigid-body simulators and inverse dynamics are available and correctly parameterized.
- Exploration data must cover the operational envelope; training assumes negligible external forces during data capture.
- Reliable joint position sensing is required; object tracking used in demos (e.g., Vicon) should be replaced by onboard perception for products.
- Compute resources (GPU) facilitate training; real-time inference must meet control-loop latency budgets.
- Safety, certification, and standardization of learned actuator models are evolving and will influence deployment timelines.
Glossary
- Actuator network: A learned model that maps robot states and control inputs to actuator-produced joint torques. "actuator networks, i.e., learned actuator models"
- Antagonistic muscle pair: Two actuators arranged to pull in opposite directions across a joint, mimicking agonist/antagonist muscles. "Each DoF is actuated by an antagonistic muscle pair."
- Ball-in-a-cup: A dynamic control task where a ball on a string must be swung into a cup on the end effector. "In the ball-in-a-cup task, the robot has to swing a ball on a string into a cup at its end effector (see \cref{fig:ball_in_a_cup_rollout})."
- Backward differences: A finite-difference method using past samples to estimate derivatives. "from the positions via backward differences and central differences, respectively."
- Bootstrapping: A resampling method for estimating uncertainty, such as confidence intervals. "The error bars visualize the 95\% confidence intervals, obtained via bootstrapping."
- Central differences: A finite-difference method using surrounding samples to estimate derivatives. "from the positions via backward differences and central differences, respectively."
- Cubic spline: A smooth, piecewise-cubic interpolation used to generate continuous trajectories. "fit a cubic spline between these commands"
- Deep Lagrangian Network: A neural model that learns dynamics consistent with Lagrangian mechanics. "learn a Deep Lagrangian Network~\cite{lutter2019deep} model of PAM dynamics"
- Degrees of freedom (DoF): The number of independent joint coordinates defining a robot’s configuration. "with four degrees of freedom (DoFs)."
- Domain randomization: Randomizing simulation parameters to make policies robust to real-world variation. "A common technique to bridge the sim-to-real gap, i.e., the difference between simulated and real dynamics, is domain randomization"
- Ensemble disagreement: The variability among multiple models’ predictions used to quantify uncertainty. "The ensemble disagreement constitutes a measure of the model's epistemic uncertainty."
- Epistemic uncertainty: Uncertainty arising from limited data or model capacity, reducible with more information. "The ensemble disagreement constitutes a measure of the model's epistemic uncertainty."
- Generalized Actuator Network (GeAN): The paper’s neural actuator model that maps histories of joint positions and control signals to torques. "Our method, called Generalized Actuator Network~(GeAN), enables actuation model identification across a wide range of robots"
- GPU-based simulator: A physics simulator accelerated by GPUs for massive parallelism and speed. "which we simulate in MuJoCo XLA (MJX)~\cite{todorov2012mujoco}, an efficient GPU-based simulator."
- Harmonic drives: High-ratio, low-backlash gear transmissions common in robotics with distinct friction and hysteresis. "in a robot arm with harmonic drives"
- Hysteresis: Path-dependent behavior where output depends on the history of inputs, not just current inputs. "due to inherent nonlinearities, friction, and hysteresis"
- Inverse dynamics: Computing required joint torques from known positions, velocities, and accelerations. "and an inverse dynamics function $\boldsymbol{\tau}_t = \invdyn(\boldsymbol{q}_t, \boldsymbol{\dot{q}_t, \boldsymbol{\ddot{q}_t)$"
- Iterative learning control: A method that refines control inputs over repeated trials to improve tracking. "such as iterative learning control~\cite{ma2022learning}"
- Mass matrix: The configuration-dependent inertia matrix relating joint accelerations to torques. "the mass matrix of the robot in position ."
- Mechanical compliance: Passive flexibility that reduces impact forces and improves safety. "the lightweight design and mechanical compliance greatly reduce contact forces upon collision"
- MuJoCo XLA (MJX): A GPU-accelerated variant of the MuJoCo physics engine used for fast simulation. "which we simulate in MuJoCo XLA (MJX)~\cite{todorov2012mujoco}"
- Open-loop (control): Executing a predefined command sequence without feedback correction during motion. "an exploration dataset of 2500 open-loop trajectories"
- Partially observable: A setting where available observations do not fully specify the system’s true state. "making the task partially observable."
- PD-controller: Proportional-derivative controller that applies control based on position error and velocity. "controlled by a PD-controller with known gains."
- Pneumatic artificial muscle (PAM): A soft, pressure-driven actuator that contracts like biological muscle. "pneumatic artificial muscles~(PAMs)"
- Proximal Policy Optimization (PPO): A stable policy-gradient RL algorithm using a clipped objective. "Proximal Policy Optimization~(PPO)~\cite{schulman2017proximal}"
- Reinforcement learning (RL): Learning control policies via reward-driven interaction with an environment. "reinforcement learning~(RL)"
- Rigid body dynamics: Dynamics of non-deformable bodies governed by Newton-Euler equations. "The arm and objects follow simple rigid body dynamics"
- Series elastic actuators: Actuators with an elastic element in series to enhance compliance and force control. "focuses on more well-behaved series elastic actuators"
- Sim-to-real transfer: Deploying a policy trained in simulation directly on real hardware. "this result constitutes the first successful sim-to-real transfer for a four-degrees-of-freedom muscle-actuated robot arm."
- System identification: Estimating model structure or parameters from measured input-output data. "utilize a combination of system identification with an analytic dynamics model"
- Tendon-driven: Actuation via tendons (cables) transmitting forces from remote actuators to joints. "a tendon-driven robot powered by pneumatic artificial muscles"
- Tendon routing: The specific paths that tendons take through a mechanism, affecting friction and coupling. "due to tendon routing."
- Torque-based simulator: A simulator that accepts joint torques as inputs to advance dynamics. "a torque-based simulator of the arm dynamics"
- Torque sensors: Sensors that directly measure joint torques. "rather than requiring torque sensors."
- Unsupervised Actuator Net (UAN): A baseline method that frames actuator modeling as an RL task. "the Unsupervised Actuator Net~(UAN)~\cite{fey2025bridging}"
- Vicon (object tracking system): A marker-based motion capture system for tracking object pose. "we use a Vicon object tracking system."
- Wilson score interval: A binomial proportion confidence interval with better small-sample performance. "computed with the Wilson score interval."
- Zero-shot transfer: Deploying to the real system without any additional fine-tuning or adaptation. "transferred zero-shot to the physical robot."
Collections
Sign up for free to add this paper to one or more collections.