Papers
Topics
Authors
Recent
Search
2000 character limit reached

JT-SPI: Robust Joint Torque Perturbation

Updated 7 February 2026
  • JT-SPI is a sim-to-real robustness methodology that injects state-conditioned, learned perturbations into joint torques to address nonlinear actuator and contact-force mismatches.
  • The approach leverages an MLP as a universal function approximator to generate bounded perturbations, vastly expanding the diversity of simulated dynamical discrepancies compared to traditional domain randomization.
  • Empirical results on high-DOF legged robots demonstrate that JT-SPI significantly improves stability and sim-to-real transfer under challenging unmodeled disturbances.

Joint Torque Space Perturbation Injection (JT-SPI) is a sim-to-real robustness methodology in which, during simulation-time policy training, learned, state-dependent perturbations are injected directly into the joint-torque inputs of the robotic forward dynamics engine. JT-SPI recasts the sim-to-real “reality gap” as an unknown nonlinear mapping from nominal torques to realized torques and exposes the control policy to a much richer family of actuator and contact-force mismatches than can be achieved by randomizing a standard, finite set of simulation parameters. The resulting policies demonstrate increased robustness to complex and previously unseen reality gaps, facilitating successful transfer of motor skills from simulation to hardware for high-DOF legged robots (Cha et al., 9 Apr 2025).

1. Formal Definition and Underlying Principles

JT-SPI addresses reality gap challenges by modeling the mapping from commanded joint torques to actual joint torques as an unknown, potentially nonlinear and state-dependent signal. Traditional domain randomization approaches typically randomize simulator parameters (such as link masses or friction coefficients) across fixed finite sets. JT-SPI, by contrast, introduces perturbations in the joint torque space that are state-conditioned and drawn from a wide functional class, parameterized via a universal function approximator (specifically, a multi-layer perceptron, or MLP).

The joint-space dynamics under domain randomized torque noise are:

M(q;pDR)q¨+C(q,q˙;pDR)+G(q;pDR)+τcontact(s;pDR)=τinput+τDR(s;pDR)M(q; p_{DR})\,\ddot q + C(q,\dot q; p_{DR}) + G(q; p_{DR}) + \tau_{contact}(s; p_{DR}) = \tau_{input} + \tau_{DR}(s; p_{DR})

JT-SPI injects a perturbation τϕ\tau_\phi from a function class JT\mathcal{J}T (implemented as an MLP) with weights ϕ\phi re-sampled each episode. At timestep tt:

  • Policy output (nominal torque): τπ(ot)\tau_\pi(o_t)
  • JT-SPI perturbation:

τϕ(st)=σlimtanh(MLP(o^privileged(t);ϕ))\tau_\phi(s_t) = \sigma_{lim} \cdot \tanh(\text{MLP}(\hat{o}_{privileged}(t); \phi))

where σlim\sigma_{lim} limits the maximum perturbation, inputs are normalized privileged full-state observations, and ϕ\phi is re-sampled per episode from XavierUniform.

The policy observes partial state otR48o_t \in \mathbb{R}^{48}, while the perturbation generator accesses privileged full state (normalized). This exposes the policy to a diverse, high-dimensional set of actuator and contact deviations.

2. Algorithmic Implementation

JT-SPI is integrated into on-policy, high-throughput simulation frameworks (e.g., IsaacGym) using parallel rollouts. A high-level pseudocode is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Initialize policy parameters θ, critic parameters, discriminator, etc.
for iteration = 1  N_updates:
    for env = 1  N_parallel:
        # Episode start
        sample φ_env ~ Xavier    # MLP weights for perturbation
        s_env = s0
        step = 0
        while not done and step < max_steps:
            o_env = partial_observation(s_env)
            a_env = π_θ(o_env)            # action in [-1,1]^12
            τ_π = τ_limit  a_env         # nominal torques
            o_priv = privileged_observation(s_env)
            ĥo_priv = o_priv / running_std
            τ_φ = σ_lim * tanh(MLP(ĥo_priv; φ_env))
            (s_next, r, done) = simulator.step(s_env, τ_π + τ_φ)
            store_transition(s_env, o_env, a_env, τ_φ, r, s_next, done)
            s_env = s_next
            step += 1
    # Policy optimization (PPO+AMP+gradpen)

Key scheduling aspects include re-sampling ϕ\phi only at episode boundaries and perturbing 50% of rollouts to maintain learning stability. The perturbation MLP comprises two 256-unit ReLU hidden layers, a final tanh layer for bounded outputs, and uses zero bias in all layers to ensure zero input yields zero perturbation.

3. Experimental Setup and Comparative Analysis

JT-SPI was validated using the TOCABI humanoid platform (100 kg, floating-base, high-ratio harmonic drives with gear ratio 100), controlled at 125 Hz. Training utilized PPO with Adversarial Motion Prior (AMP) imitation and a gradient penalty regularizer.

Comparison was made to:

  • Domain Randomization (DR): Randomizes parameters such as terrain friction, link masses, center-of-mass offsets, armature inertia, damping, motor constant, latency, random pushes, and observation noise within prescribed ranges.
  • ERFI Baseline: (from Campanaro et al.) Applies untargeted additive torque noise at each joint.

JT-SPI differs fundamentally by generating state-dependent, potentially highly nonlinear perturbations rather than white-noise (ERFI) or a finite augmented parameter set (DR).

Perturbation and DR method characteristics:

Method Perturbation Type State Dependence Functional Family
DR Parameter randomization No Predefined
ERFI Untargeted torque noise No White noise
JT-SPI Learned, bounded perturbation Yes Universal approx.

4. Empirical Evaluation and Scenario Results

JT-SPI was evaluated on velocity-commanded humanoid walking at vx=0.4v_x = 0.4 m/s, with yaw commands in [1,1][-1, 1] rad/s and zero lateral command. Key performance metrics included forward velocity tracking error (mean, variance), lateral/yaw tracking, and success/failure at maintaining balance.

Test conditions covered:

  • Nominal simulation: All methods exhibited similar velocity tracking and gait quality.
  • Unseen actuator stiffness (250 Nm/rad, not seen during training): JT-SPI and ERFI succeeded; DR failed.
  • Unseen contact compliance (Mujoco solref time constant 0.2 s, soft ground): Only JT-SPI succeeded robustly; DR and ERFI failed (robot falls).
  • Sim-to-real transfer (lab, uneven/slippery floor): JT-SPI succeeded in all seeds, DR in 2/3, ERFI in none.

No additional ablations on fraction of perturbed environments or MLP size were reported. These results indicate a substantial robustness improvement of JT-SPI to both actuator and contact-variation reality gaps, with successful transfer under challenging real-world perturbations.

5. Hyperparameters and Training Guidelines

Recommended settings and procedures include:

  • MLP architecture: Two hidden layers of 256 ReLU units each; tanh output layer; all layers have zero bias (ensuring zero-motion state yields zero perturbation).
  • Observation normalization: Normalize privileged observations by running standard deviation only, omitting mean subtraction.
  • Perturbation magnitude (σ_lim): Start at 20 Nm, gradually increase to 50 Nm for joints. For base-force perturbations, use up to 80 N. Adjust based on observed gait stability in nominal simulation.
  • Fraction of perturbed rollouts: 50% perturbation recommended to balance robustness and learning stability.
  • Perturbation sampling: Newly sample ϕ\phi once per episode; re-sampling per timestep is discouraged as it produces high-frequency noise that harms policy learning.

The design rationale leverages the Universal Approximation Theorem, stating the MLP class can model any continuous, bounded torque perturbation mapping τreality(s)\tau_{reality}(s). By randomizing ϕ\phi each episode, the policy is forced to generalize over a large subspace of such mappings. Empirically, policies withstand higher unmodeled disturbances, supporting robust sim-to-real transfer (Cha et al., 9 Apr 2025).

6. Distinguishing Characteristics and Theoretical Implications

The JT-SPI approach enables policies to encounter a vastly richer set of actuator/force deviations compared to approaches based solely on predefined parameter randomization. State-dependent perturbations allow simulation of complex, context-sensitive discrepancies (e.g., nonuniform friction, actuator nonlinearities) which fixed-parameter or white-noise models cannot represent.

A plausible implication is that as simulator fidelity and robot actuation complexity increase, sim-to-real transfer robustness will demand perturbation schemes with sufficient expressiveness, such as the universal function class utilized in JT-SPI. The finding that half-perturbed rollouts maximize learning stability, and that re-sampling at episode rather than step level is crucial, highlights important scheduler design considerations. JT-SPI thereby contributes a generalizable framework for bridging reality gaps in modern high-DOF robotic locomotion (Cha et al., 9 Apr 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Joint Torque Space Perturbation Injection (JT-SPI).