Conjecture on Backflip Difficulty for Time-Varying Linear Feedback Policies

Determine whether executing a backflip motion with time-varying linear feedback control policies learned via a Linear Policy Net inherently requires higher-frequency action outputs, making the backflip challenging for such policies and degrading smoothness.

Background

In the evaluations, the Linear Policy Net (LPN) with an action Jacobian penalty generally produces smooth control signals and fast learning convergence. However, for the backflip motion, the smoothness metrics for the LPN policy are worse than those for certain feedforward baselines with Jacobian or reward-based penalties.

Based on these observations, the authors state a conjecture that the backflip demands higher-frequency actions from time-varying linear feedback policies, which may explain the reduced smoothness observed for LPN on this task.

References

We conjecture that the backflip is a challenging motion for a time-varying linear feedback control policies, requiring the LPN to produce higher frequency action.

Learning Smooth Time-Varying Linear Policies with an Action Jacobian Penalty  (2602.18312 - Xie et al., 20 Feb 2026) in Section 6.2: Comparison