Complete treatment of bounded ES for time-varying-goal manipulation

Develop a complete theoretical treatment of bounded extremum seeking for the time-varying-goal versions of the pushing and pick-and-place tasks used in the ES-DRL controller, explicitly accounting for the additional terms induced by the goal’s rate of change after the RL-to-ES handoff.

Background

The paper proposes a hybrid controller that switches from a DDPG policy to bounded extremum seeking (ES) after contact is established. For the fixed-goal pushing phase, a proposition and sketch of proof argue that ES can drive the object to an arbitrarily small neighborhood of the goal under suitable gains and frequency conditions, despite unknown friction.

Immediately after this fixed-goal analysis, the authors note that a similar argument should apply when the goal is time-varying, but that additional terms arise due to the rate of change of the goal. They explicitly state that a complete treatment of this time-varying-goal case is left for future work, indicating that a full theoretical analysis of ES in this setting remains open.

References

A similar argument applies to the time-varying-goal pushing and pick-and-place tasks, where additional terms arise due to the rate of change of the goal. A complete treatment of that case is beyond the scope of this paper and is left for future work.

Deep Reinforcement Learning for Robotic Manipulation under Distribution Shift with Bounded Extremum Seeking  (2604.01142 - Saxena et al., 1 Apr 2026) in Section 4.3 (Supervisor), following Proposition (Sketch of Proof)