GentleHumanoid: Compliant Upper-Body Control

Updated 11 November 2025

GentleHumanoid is a control framework that integrates impedance control with whole-body RL to enable compliant and adaptive upper-body interactions in humanoid robots.
It employs a unified spring-based impedance formulation across shoulder, elbow, and wrist to model both resistive and guiding contacts during dynamic tasks.
Experimental results show 50%-60% force reduction and smoother contact transitions, ensuring safe human–robot interactions in varied scenarios.

GentleHumanoid is a control framework for humanoid robots designed to achieve upper-body compliance during contact-rich interactions with humans and objects. Unlike prior reinforcement learning (RL) policies that prioritize rigid trajectory tracking and treat external forces as disturbances, GentleHumanoid integrates impedance control into a whole-body motion tracking policy. This enables resistance and guidance during contact, distributed across multiple upper-body links such as the shoulder, elbow, and wrist, while exposing the system to a broad spectrum of physically interactive scenarios. The framework’s unified spring-based formulation supports kinematically consistent contact modeling, safety-adjustable force thresholds, and robust real-world performance in tasks requiring gentle and natural interaction.

1. Motivation and Design Objectives

Safe and natural physical interaction is foundational for humanoid deployment in human-centered environments, including healthcare, domestic assistance, and collaborative workplaces. Prevailing RL-based whole-body policies emphasize strict trajectory tracking, actively suppressing contact events and external disturbances, resulting in stiff and potentially unsafe robot behavior. This paradigm conflicts with the practical needs of physical human–robot interaction, where nuanced, adaptive compliance to varying contact types—such as hugging, assisted movement, and manipulation of fragile objects—is crucial.

GentleHumanoid is explicitly designed to:

Integrate impedance control within a whole-body RL policy to achieve end-to-end upper-body compliance.
Model both resistive contacts (robot pressing into its environment) and guiding contacts (robot being pulled or pushed by environmental or human forces) using a unified, kinematically coherent framework.
Provide task-adaptable, safety-driven force limits for comfort and regulatory compliance.

2. Unified Spring-based Impedance Formulation

Central to GentleHumanoid is a spring-based impedance model applied to major upper-body keypoints ( $i \in \{\text{shoulder}, \text{elbow}, \text{hand}\}$ ), each conceptualized as a point mass $M$ subjected to both trajectory tracking and contact interaction forces:

$M\,\ddot{\mathbf{x}} = \mathbf{f}_{\rm drive} + \mathbf{f}_{\rm interact}$

where $\mathbf{x}, \dot{\mathbf{x}} \in \mathbb{R}^3$ are the position and velocity of each link in the root frame.

Tracking (Driving) Force: Each link is drawn toward its reference motion via a spring-damper:

$\mathbf{f}_{\rm drive} = K_p (\mathbf{x}_{\rm tar} - \mathbf{x}) + K_d (\dot{\mathbf{x}}_{\rm tar} - \dot{\mathbf{x}})$

with damping $K_d = 2 \sqrt{M K_p}$ (critical).

Interaction Force: Environmental contacts are modeled using:

$\mathbf{f}_{\rm interact} = K_{\rm spring} (\mathbf{x}_{\rm anchor} - \mathbf{x})$

In resistive contact, $\mathbf{x}_{\rm anchor}$ is fixed at the contact’s onset, generating a restoring force.
In guiding contact, $\mathbf{x}_{\rm anchor}$ is sampled from human motion data, simulating coordinated external pushes or pulls.

Randomization of $K_{\rm spring} \sim \mathcal{U}(5, 250)$ N/m and anchor/link selection exposes the policy to a diverse distribution of contact scenarios. Full-posture sampling ensures that contact anchors for shoulder, elbow, and wrist co-vary in a physically plausible, kinematically consistent manner.

3. Control Policy Architecture

The RL policy comprises state and action representations designed for deployment on real humanoid hardware:

Observation Space ( $\mathbf{o}_t$ $o_{t}$ ):
- $\tau_{\rm safe}$ : current task-specific force threshold
- $\mathbf{m}_{\rm tar}$ : reference future root poses and joint targets
- $\boldsymbol{\omega}, \mathbf{g}$ : base angular velocity, gravity vector
- $\mathbf{q}^{\rm hist}$ : joint position history
- $\mathbf{a}_{t-3:t-1}$ : previous actions
Action Space ( $\mathbf{a}_t \in \mathbb{R}^{29}$ $a_{t} \in R^{29}$ ):
- Joint position targets at 50 Hz, actuated by low-level PD controllers.

A parallel “reference dynamics” engine numerically integrates the spring-mass equations using semi-implicit Euler. The policy is trained to make the simulated trajectory follow this compliant reference, rather than a purely geometric reference path.

The RL algorithm is Proximal Policy Optimization (PPO) in a teacher–student configuration. The teacher model uses privileged access to reference-computed states and torques; the student operates on realistic, robot-available signals only.

The reward signal incorporates:

Reward Term	Formula/Definition
Reference tracking	$\exp\left(-\frac{\\|\mathbf{x}^{\rm sim} - \mathbf{x}^{\rm ref}\\|_2}{\sigma_x}\right) + \exp\left(-\frac{\\|\dot{\mathbf{x}}^{\rm sim} - \dot{\mathbf{x}}^{\rm ref}\\|_2}{\sigma_v}\right)$
Force matching	$\exp\left(-\frac{\\|\mathbf{f}_{\rm interact} - \mathbf{f}_{\rm interact}^{\rm sim}\\|_2}{\sigma_f}\right)$
Unsafe force penalty	$-\mathbf{1}(\\|\mathbf{f}_{\rm interact}\\| > \tau_{\rm safe} + \delta_{\rm tol})$

The overall compliance reward is a weighted sum: $r_{\rm compliance} = w_{\rm dyn} r_{\rm dyn} + w_{\rm force} r_{\rm force} + w_{\rm pen} r_{\rm pen}$

Additional terms incentivize motion tracking, stability, and safe locomotion.

4. Safety and Force-limiting Mechanisms

GentleHumanoid enforces safety in compliance with human–robot physical interaction standards by:

Sampling a task-specific force threshold $\tau_{\rm safe} \in [F_1, F_2]$ every 5 seconds, made available to the policy as an input.
Applying driving-force clamping:

$\mathbf{f}_{\rm drive\_limited} = \min\left(1, \frac{\tau_{\rm safe}}{\|\mathbf{f}_{\rm drive}\|}\right) \mathbf{f}_{\rm drive}$

which hard-limits commanded forces.

Setting thresholds based on ISO/TS 15066 safety guidelines and empirical comfort data (e.g., hugging pressure below 13 kPa).

A plausible implication is that such adaptive safety constraint integration is necessary for robots engaging in variable-contact, human-proximal scenarios, including rehabilitation and assistive care.

5. Experimental Methodology and Baselines

Evaluation was performed in both simulation (IsaacGym/MuJoCo) and on a physical Unitree G1 humanoid (18-link model). The hardware setup included Mark-10 wrist force gauges and custom 40-taxel waist-mounted capacitive pressure pads for high-fidelity pressure mapping during interaction.

Tasks:

Gentle hugging: under both correct and misaligned postures.
Sit-to-stand assistance: requiring multiple-link compliance.
Safe object manipulation: handling fragile objects (fragile balloons).

Baselines:

A: Whole-body RL tracking without force perturbations.
B: Whole-body RL tracking with 30 N end-effector force perturbations (force-adaptive policy at end effector only).

Metrics:

Peak contact force per link (hand, elbow, shoulder), in newtons.
Task success rate (successful completion of interactive task).
Force/pressure time profiles, pressure-map heatmaps, and qualitative assessment of smoothness and robustness.

6. Results and Significance

GentleHumanoid demonstrates substantial reduction in peak contact forces and improved compliance while retaining task completion capabilities:

In simulated hugging under external pulling, right-hand peak forces stabilized at ~10 N (versus >20 N for Baseline A and >13 N for Baseline B), with similar 50%–60% reductions at elbow and shoulder.
Hardware static compliance tests showed that GentleHumanoid met user-specified limits (e.g., 10 N), while Baselines A and B required 24.6 N and 51.1 N, respectively, to move the wrist.
Pressure heatmaps during mannequin hugging exhibited low, uniform pressures (<7 kPa) for GentleHumanoid across posture variations; baselines exhibited localized high-pressure spikes.
In balloon handling, GentleHumanoid maintained object stability under a 5 N limit, in contrast to baselines that excessively compressed or dropped the object.
Across all metrics, GentleHumanoid achieved approximately 50%–60% reduction in peak interaction forces, posture-invariant whole-arm compliance, and smoother contact transitions, all without compromising task success.

This suggests that unifying impedance-based compliance across all major upper-body links within a whole-body RL policy, combined with safety-aware force constraints, enables robust, safe, and generalized physical interaction for humanoid robots operating in unstructured, dynamic environments.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to GentleHumanoid Framework.