ConstrainedMimic: Runtime Safety for Humanoid Control

Updated 4 July 2026

ConstrainedMimic is a control framework for humanoid robots that enforces runtime safety using layered CBF-based filtering to handle collision avoidance, joint limits, and center-of-mass stability.
It combines whole-body kinematics and dynamics with three intervention stages—constrained retargeting, kinematic filtering, and dynamic filtering—to impose constraints without retraining the RL policy.
Implemented in JAX and tested on simulated Unitree G1, the framework achieves real-time performance on CPU, GPU, and TPU while preserving nominal policy behavior.

ConstrainedMimic is a control framework for humanoid robots that enforces runtime constraints within reinforcement-learning-based whole-body tracking policies. It combines whole-body kinematics and dynamics, operational space control, and control barrier functions (CBFs) to impose arbitrary deployment-time constraints on both the kinematic reference motion and the underlying dynamics, without retraining the policy. The framework is presented as a policy-agnostic, minimally invasive safety layer with three intervention points—constrained retargeting, kinematic filtering, and dynamic filtering—and is evaluated in simulation on a Unitree G1 using pre-trained RL policies, where it demonstrates collision avoidance, joint-limit enforcement, and center-of-mass stability at real-time rates on CPU, GPU, and TPU (Morton et al., 29 May 2026).

1. Scope, motivation, and design objective

ConstrainedMimic is motivated by a specific gap in contemporary humanoid control. Reinforcement-learning policies for whole-body tracking can produce agile motion, but they do not inherently guarantee safety. At deployment time, a humanoid may encounter constraints that were absent during training, including nearby humans or objects, self-collision risks, joint limits, stability requirements, and task-specific workspace limits. The framework is designed to make such constraints runtime configurable while preserving as much of the learned policy behavior as possible (Morton et al., 29 May 2026).

A defining feature is that ConstrainedMimic treats safety insertion as possible at two natural interfaces already present in learned humanoid tracking. The first is the input side, namely the kinematic reference that is passed into the policy. The second is the output side, namely the low-level commands produced by the policy. This yields a layered architecture in which safety can be imposed during retargeting, after retargeting, or after policy inference. The stated objective is not to replace RL tracking, but to let a pre-trained policy continue tracking and recovering robustly while a lightweight optimization-based safety layer enforces constraints that may only become known at runtime (Morton et al., 29 May 2026).

This formulation also clarifies a common misconception. ConstrainedMimic is not a training-time constrained imitation-learning algorithm in the usual sense of behavior-cloning regularization or cost-constrained occupancy matching. It is a runtime safety framework for already learned humanoid motion policies, and its main novelty lies in enforcing constraints on the full contact-constrained kinematics and dynamics of the robot rather than on a reduced-order abstraction (Morton et al., 29 May 2026).

2. Mathematical foundations: CBFs, contact consistency, and whole-body dynamics

The core safety formalism is the standard CBF condition for a control-affine system

$\dot{\mathbf{z}} = f(\mathbf{z}) + g(\mathbf{z})\mathbf{u},$

with barrier function $h(\mathbf{z})$ defining the safe set $\{\mathbf{z} : h(\mathbf{z}) \ge 0\}$ . Safety is imposed by requiring

$L_f h(\mathbf{z}) + L_g h(\mathbf{z}) \mathbf{u} \ge -\alpha(h(\mathbf{z})),$

which ConstrainedMimic realizes through a quadratic program of the form

$\begin{aligned} \underset{\mathbf{u}}{\text{minimize}} \quad & \| \mathbf{u} - \mathbf{u}_{\text{nom}} \|_2^2 \ \text{subject to} \quad & L_f h(\mathbf{z}) + L_g h(\mathbf{z}) \mathbf{u} \ge -\alpha(h(\mathbf{z})). \end{aligned}$

For higher-relative-degree constraints, the paper also gives a second-order HOCBF construction (Morton et al., 29 May 2026).

What distinguishes ConstrainedMimic from generic CBF filtering is its use of contact-consistent whole-body models. On the kinematic side, the framework defines a contact null-space projection $\mathbf{N}_c(\mathbf{q})$ so that feasible joint velocities satisfy

$\dot{\mathbf{q}} = \mathbf{N}_c \dot{\mathbf{q}}_0.$

On the dynamic side, it uses the contact-constrained equations

$\mathbf{M} \ddot{\mathbf{q}} + \mathbf{N}_c^T (\mathbf{c}+\mathbf{g}) + \mathbf{J}_c^T \boldsymbol{\mu}_{c,\dot{J}} = \mathbf{N}_c^T \boldsymbol{\Gamma},$

together with constrained operational-space quantities such as the task Jacobian

$\mathbf{J}_{t|c}(\mathbf{q}) = \mathbf{J}\mathbf{N}_c.$

The explicit role of these constructions is to ensure that the safety filter respects the current contact mode—no contact, left foot, right foot, or double support—so that the optimizer does not command motions inconsistent with active contacts (Morton et al., 29 May 2026).

This contact consistency is central to the framework’s claim of being minimally restrictive. The filter is not merely required to find a safe command; it is required to find a safe command that remains aligned with the robot’s current contact-constrained kinematics and the policy’s motion-tracking objectives. A plausible implication is that this substantially reduces the risk of safety interventions generating infeasible or behaviorally disruptive corrections.

3. Layered architecture and intervention points

ConstrainedMimic is organized as a three-stage safety stack. Each stage can be used independently, and the experiments indicate that combining them is often the most reliable configuration (Morton et al., 29 May 2026).

Layer	Where it acts	Main role
Constrained retargeting	During human-to-robot retargeting	Makes the reference motion CBF-safe
Kinematic safety filter	After retargeting	Minimally adjusts the reference to satisfy constraints
Dynamic safety filter	After policy output	Filters the RL command using full constrained dynamics

In constrained retargeting, a CBF is inserted directly into inverse-kinematics retargeting from human motion to robot motion. The optimization minimizes weighted tracking error across multiple pairwise tasks while enforcing barrier constraints. For the G1 setup, the paper states that $n_t = 14$ pairwise tasks are used, with position and orientation terms weighted independently. This makes the retargeting process itself safety aware rather than treating safety as a downstream correction (Morton et al., 29 May 2026).

The kinematic safety filter is intended for situations in which retargeting has already been performed. It uses a task hierarchy with end-effector and center-of-mass tracking as primary objectives and posture in the task null space as a secondary objective. The filter minimizes deviation from the nominal joint velocity while satisfying the same CBF constraints. This is the paper’s notion of task-consistent filtering: preserve task motion first, alter lower-priority structure first, and intervene only as much as the barrier constraints require (Morton et al., 29 May 2026).

The dynamic safety filter acts on the RL policy output. The nominal command is generated from clipped PD targets, and the safety layer then solves a hierarchical QP using the full contact-constrained underactuated dynamics. The hierarchy is explicit: primary contact force tracking, secondary task-space motion-acceleration tracking, and tertiary posture-acceleration tracking. After solving, the safe torque is mapped back to safe PD targets because the Unitree G1 interface used in the reported setup does not use direct torque control in the final deployment loop (Morton et al., 29 May 2026).

The framework’s reported conclusion is that safety must often be enforced at both the input and output sides of the policy. Constrained retargeting can ensure that the reference is safe, but the learned policy may still overshoot that safe reference unless dynamic filtering is also applied (Morton et al., 29 May 2026).

4. Constraint classes and the principle of minimal intervention

The supported runtime constraints include collision avoidance, joint limits, and center-of-mass stability. Collision avoidance is modeled using sphere-sphere, sphere-plane, and sphere-cylinder approximations, with barrier functions based on signed separation. The paper applies these models to both self-collision and external-obstacle avoidance. Joint-limit constraints are written directly in terms of $h(\mathbf{z})$ 0 and $h(\mathbf{z})$ 1, and the QPs also include velocity and torque box constraints. Center-of-mass stability is expressed by checking whether the projected center of mass remains inside the feet support polygon (Morton et al., 29 May 2026).

These constraint classes are not treated uniformly. Their enforcement is explicitly tied to contact mode and task priority. Contact consistency prevents the system from commanding motions that violate active foot contacts, while task consistency causes the filter to preserve end-effector and center-of-mass tracking before modifying posture or lower-priority null-space behavior. In the dynamic filter, this prioritization is extended to contact-force tracking first, then task-space accelerations, then posture accelerations (Morton et al., 29 May 2026).

This organization is the framework’s main answer to the standard criticism of QP safety filters, namely that they can overconstrain learned behaviors. ConstrainedMimic formulates safety as a closest-safe-command problem in a task-aware null space. The stated effect is to preserve policy recovery behaviors, avoid unnecessary motion changes, and keep the robot close to nominal learned behavior unless safety truly requires intervention. This suggests that the framework is less a replacement controller than a structured corrective layer (Morton et al., 29 May 2026).

Another important clarification concerns the scope of the guarantees. The paper explicitly notes that some scenarios exceed what single-step CBF logic can resolve. The “limbo under a cylindrical obstacle” example shows that CBFs can enforce local safety but may not select the correct long-horizon strategy, such as deciding whether to crouch or to limbo. Similarly, the “breaking contact mode to avoid a dynamic obstacle” example shows that some safety problems require contact-mode changes and higher-level planning rather than purely local filtering (Morton et al., 29 May 2026).

5. Implementation and empirical evidence

ConstrainedMimic is reported as fully differentiable and implemented in JAX. The implementation uses frax for kinematics and dynamics, cbfpy for CBF differentiation, and qpax for differentiable QP solving. The framework is compatible with CPU, GPU, and TPU, and the paper reports runtime of up to 300 Hz for constrained retargeting, up to 2000 Hz for kinematic filters, and up to 500 Hz for dynamic filters. In the experiments, the dynamic filter is run at 250 Hz, which is described as five times the policy rate (Morton et al., 29 May 2026).

The experimental platform is a simulated Unitree G1 using unmodified pre-trained RL policies, specifically TWIST2 and SONIC. The reported scenarios include self-collision in teleoperation, “karate chop” external obstacle avoidance, center-of-mass stability during a “Smooth Criminal” lean, limbo under a cylindrical obstacle, and a dynamic-obstacle scenario in which safety would require changing contact mode (Morton et al., 29 May 2026).

The quantitative results most clearly reported are for self-collision and external obstacle avoidance. For self-collision, the base policy incurs about 37.6% violation frames, kinematic safety reduces this to about 8%, dynamic safety reduces it to about 0.35%, and kinematic plus dynamic safety reaches near 0%. For the karate-chop obstacle-avoidance case, the base policy shows about 20% violation frames with very large maximum violation; kinematic safety reduces violations but remains imperfect, dynamic safety produces much smaller violations, and using both filters gives the best overall performance (Morton et al., 29 May 2026).

The qualitative findings are equally important. In teleoperation, the paper reports that a safe reference alone is insufficient because the policy can still overshoot. In the CoM-stability experiment, it reports that ignoring contact-constrained kinematics can make the safety solution infeasible. In the limbo and dynamic-obstacle settings, it reports that some safety requirements demand discrete high-level reasoning rather than only local CBF enforcement. The aggregate interpretation is therefore not that ConstrainedMimic universally solves humanoid safety, but that it substantially reduces violation rates in real time while preserving policy behavior whenever the problem remains compatible with local contact-consistent filtering (Morton et al., 29 May 2026).

6. Position within constrained imitation and control research

The name “ConstrainedMimic” can be confused with the broader literature on constrained imitation learning, but the specific framework described here occupies a distinct niche. Earlier kernelized movement-primitive approaches such as LC-KMP and EKMP treat constrained imitation as trajectory adaptation under linear, nonlinear, equality, and inequality hard constraints, with probabilistic trajectory priors learned from demonstrations (Huang et al., 2019, Huang, 2021). Constrained behavior-cloning approaches such as GHCBC instead improve robustness by imposing historical and geometric priors during policy learning (Liang et al., 2024). Cost-constrained imitation-learning formulations extend occupancy-measure matching by requiring the learner’s expected cost to remain below the expert’s cost, using Lagrangian, meta-gradient, or alternating-gradient methods (Shao et al., 2024).

Recent humanoid imitation work provides an especially relevant contrast. PressMimic treats humanoid imitation as a constrained physical reproduction problem in which pressure serves as a physical grounding signal across both motion capture and control, improving contact timing, support inference, trajectory consistency, and execution stability (Lu et al., 25 Jun 2026). Action-constrained and constrained-demonstrator settings address yet another mismatch: either the imitator has a smaller feasible action set than the expert, or the robot is more capable than the demonstrator, which motivates surrogate trajectories or state-only progress rewards rather than direct action imitation (Yeh et al., 20 Aug 2025, Li et al., 10 Oct 2025). ConstrainedMimic differs from all of these by focusing on post-training runtime enforcement for an already learned whole-body RL tracking policy rather than modifying the training objective itself (Morton et al., 29 May 2026).

This distinction is useful conceptually. Training-time constrained imitation methods typically shape what the policy learns, whereas ConstrainedMimic shapes what the policy is allowed to execute at deployment. This suggests a complementary relationship rather than a competing one. A pressure-grounded imitation pipeline such as PressMimic could provide more physically consistent references, while ConstrainedMimic could enforce deployment-time collision, joint-limit, or CoM constraints on the resulting humanoid tracker. That complementarity is suggested by the respective formulations, although such a combined system is not evaluated in the reported experiments (Lu et al., 25 Jun 2026, Morton et al., 29 May 2026).

The framework’s limitations are stated explicitly. Single-step safety logic may fail when long-horizon strategy selection is required; disturbances and falls are not explicitly handled; contact-estimation uncertainty can degrade dynamic safety; distribution shift can still occur, especially with dynamic filters; the method assumes relatively accurate knowledge of robot and environment geometry; and the dynamic filter is evaluated mainly under well-known contact modes (Morton et al., 29 May 2026). Accordingly, ConstrainedMimic is best understood not as a complete humanoid safety stack, but as a contact-aware and task-aware CBF layer that retrofits runtime constraint handling onto existing whole-body RL policies with minimal interference to nominal tracking behavior.