Extension of Lipschitz-Constrained Policies Beyond Locomotion

Determine how well Lipschitz-constrained reinforcement learning policies, which impose a gradient penalty on the likelihood of a control action to encourage smoothness, extend to challenging physics-based character animation scenarios beyond locomotion.

Background

Lipschitz-constrained policies regularize sensitivity by penalizing the gradient of the likelihood of an action under the current policy. Prior work reports effectiveness for locomotion tasks, but this approach relies on many action samples to accurately estimate sensitivity and adds significant computational overhead due to additional backpropagation.

In the context of physics-based character animation, many tasks involve dynamic and complex interactions (e.g., parkour, gymnastics, and object/environment contact). The authors explicitly note uncertainty regarding whether the benefits observed for locomotion will translate to such challenging scenarios.

References

While this method has been effective for locomotion tasks, it remains unclear how well it extends to more challenging scenarios.

Learning Smooth Time-Varying Linear Policies with an Action Jacobian Penalty  (2602.18312 - Xie et al., 20 Feb 2026) in Section 1: Introduction