JT-SPI: Robust Joint Torque Perturbation

Updated 7 February 2026

JT-SPI is a sim-to-real robustness methodology that injects state-conditioned, learned perturbations into joint torques to address nonlinear actuator and contact-force mismatches.
The approach leverages an MLP as a universal function approximator to generate bounded perturbations, vastly expanding the diversity of simulated dynamical discrepancies compared to traditional domain randomization.
Empirical results on high-DOF legged robots demonstrate that JT-SPI significantly improves stability and sim-to-real transfer under challenging unmodeled disturbances.

Joint Torque Space Perturbation Injection (JT-SPI) is a sim-to-real robustness methodology in which, during simulation-time policy training, learned, state-dependent perturbations are injected directly into the joint-torque inputs of the robotic forward dynamics engine. JT-SPI recasts the sim-to-real “reality gap” as an unknown nonlinear mapping from nominal torques to realized torques and exposes the control policy to a much richer family of actuator and contact-force mismatches than can be achieved by randomizing a standard, finite set of simulation parameters. The resulting policies demonstrate increased robustness to complex and previously unseen reality gaps, facilitating successful transfer of motor skills from simulation to hardware for high-DOF legged robots (Cha et al., 9 Apr 2025).

1. Formal Definition and Underlying Principles

JT-SPI addresses reality gap challenges by modeling the mapping from commanded joint torques to actual joint torques as an unknown, potentially nonlinear and state-dependent signal. Traditional domain randomization approaches typically randomize simulator parameters (such as link masses or friction coefficients) across fixed finite sets. JT-SPI, by contrast, introduces perturbations in the joint torque space that are state-conditioned and drawn from a wide functional class, parameterized via a universal function approximator (specifically, a multi-layer perceptron, or MLP).

The joint-space dynamics under domain randomized torque noise are:

$M(q; p_{DR})\,\ddot q + C(q,\dot q; p_{DR}) + G(q; p_{DR}) + \tau_{contact}(s; p_{DR}) = \tau_{input} + \tau_{DR}(s; p_{DR})$

JT-SPI injects a perturbation $\tau_\phi$ from a function class $\mathcal{J}T$ (implemented as an MLP) with weights $\phi$ re-sampled each episode. At timestep $t$ :

Policy output (nominal torque): $\tau_\pi(o_t)$
JT-SPI perturbation:

$\tau_\phi(s_t) = \sigma_{lim} \cdot \tanh(\text{MLP}(\hat{o}_{privileged}(t); \phi))$

where $\sigma_{lim}$ limits the maximum perturbation, inputs are normalized privileged full-state observations, and $\phi$ is re-sampled per episode from XavierUniform.

The policy observes partial state $o_t \in \mathbb{R}^{48}$ , while the perturbation generator accesses privileged full state (normalized). This exposes the policy to a diverse, high-dimensional set of actuator and contact deviations.

2. Algorithmic Implementation

JT-SPI is integrated into on-policy, high-throughput simulation frameworks (e.g., IsaacGym) using parallel rollouts. A high-level pseudocode is:

Initialize policy parameters θ, critic parameters, discriminator, etc.
for iteration = 1 … N_updates:
    for env = 1 … N_parallel:
        # Episode start
        sample φ_env ~ Xavier    # MLP weights for perturbation
        s_env = s0
        step = 0
        while not done and step < max_steps:
            o_env = partial_observation(s_env)
            a_env = π_θ(o_env)            # action in [-1,1]^12
            τ_π = τ_limit ⊙ a_env         # nominal torques
            o_priv = privileged_observation(s_env)
            ĥo_priv = o_priv / running_std
            τ_φ = σ_lim * tanh(MLP(ĥo_priv; φ_env))
            (s_next, r, done) = simulator.step(s_env, τ_π + τ_φ)
            store_transition(s_env, o_env, a_env, τ_φ, r, s_next, done)
            s_env = s_next
            step += 1
    # Policy optimization (PPO+AMP+gradpen)

Key scheduling aspects include re-sampling $\phi$ only at episode boundaries and perturbing 50% of rollouts to maintain learning stability. The perturbation MLP comprises two 256-unit ReLU hidden layers, a final tanh layer for bounded outputs, and uses zero bias in all layers to ensure zero input yields zero perturbation.

3. Experimental Setup and Comparative Analysis

JT-SPI was validated using the TOCABI humanoid platform (100 kg, floating-base, high-ratio harmonic drives with gear ratio 100), controlled at 125 Hz. Training utilized PPO with Adversarial Motion Prior (AMP) imitation and a gradient penalty regularizer.

Comparison was made to:

Domain Randomization (DR): Randomizes parameters such as terrain friction, link masses, center-of-mass offsets, armature inertia, damping, motor constant, latency, random pushes, and observation noise within prescribed ranges.
ERFI Baseline: (from Campanaro et al.) Applies untargeted additive torque noise at each joint.

JT-SPI differs fundamentally by generating state-dependent, potentially highly nonlinear perturbations rather than white-noise (ERFI) or a finite augmented parameter set (DR).

Perturbation and DR method characteristics:

Method	Perturbation Type	State Dependence	Functional Family
DR	Parameter randomization	No	Predefined
ERFI	Untargeted torque noise	No	White noise
JT-SPI	Learned, bounded perturbation	Yes	Universal approx.

4. Empirical Evaluation and Scenario Results

JT-SPI was evaluated on velocity-commanded humanoid walking at $v_x = 0.4$ m/s, with yaw commands in $[-1, 1]$ rad/s and zero lateral command. Key performance metrics included forward velocity tracking error (mean, variance), lateral/yaw tracking, and success/failure at maintaining balance.

Test conditions covered:

Nominal simulation: All methods exhibited similar velocity tracking and gait quality.
Unseen actuator stiffness (250 Nm/rad, not seen during training): JT-SPI and ERFI succeeded; DR failed.
Unseen contact compliance (Mujoco solref time constant 0.2 s, soft ground): Only JT-SPI succeeded robustly; DR and ERFI failed (robot falls).
Sim-to-real transfer (lab, uneven/slippery floor): JT-SPI succeeded in all seeds, DR in 2/3, ERFI in none.

No additional ablations on fraction of perturbed environments or MLP size were reported. These results indicate a substantial robustness improvement of JT-SPI to both actuator and contact-variation reality gaps, with successful transfer under challenging real-world perturbations.

5. Hyperparameters and Training Guidelines

Recommended settings and procedures include:

MLP architecture: Two hidden layers of 256 ReLU units each; tanh output layer; all layers have zero bias (ensuring zero-motion state yields zero perturbation).
Observation normalization: Normalize privileged observations by running standard deviation only, omitting mean subtraction.
Perturbation magnitude (σ_lim): Start at 20 Nm, gradually increase to 50 Nm for joints. For base-force perturbations, use up to 80 N. Adjust based on observed gait stability in nominal simulation.
Fraction of perturbed rollouts: 50% perturbation recommended to balance robustness and learning stability.
Perturbation sampling: Newly sample $\phi$ once per episode; re-sampling per timestep is discouraged as it produces high-frequency noise that harms policy learning.

The design rationale leverages the Universal Approximation Theorem, stating the MLP class can model any continuous, bounded torque perturbation mapping $\tau_{reality}(s)$ . By randomizing $\phi$ each episode, the policy is forced to generalize over a large subspace of such mappings. Empirically, policies withstand higher unmodeled disturbances, supporting robust sim-to-real transfer (Cha et al., 9 Apr 2025).

6. Distinguishing Characteristics and Theoretical Implications

The JT-SPI approach enables policies to encounter a vastly richer set of actuator/force deviations compared to approaches based solely on predefined parameter randomization. State-dependent perturbations allow simulation of complex, context-sensitive discrepancies (e.g., nonuniform friction, actuator nonlinearities) which fixed-parameter or white-noise models cannot represent.

A plausible implication is that as simulator fidelity and robot actuation complexity increase, sim-to-real transfer robustness will demand perturbation schemes with sufficient expressiveness, such as the universal function class utilized in JT-SPI. The finding that half-perturbed rollouts maximize learning stability, and that re-sampling at episode rather than step level is crucial, highlights important scheduler design considerations. JT-SPI thereby contributes a generalizable framework for bridging reality gaps in modern high-DOF robotic locomotion (Cha et al., 9 Apr 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Sim-to-Real of Humanoid Locomotion Policies via Joint Torque Space Perturbation Injection (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Joint Torque Space Perturbation Injection (JT-SPI).