Discovering a non-contact placement strategy for Stack via capability or search

Determine whether increasing the capability of the coding agent or extending the iterative search within the Act–Observe–Rewrite (AOR) framework enables discovery of a placement strategy for the robosuite Stack task that prevents the gripper fingers from contacting the lower cube (cubeB) during placement.

Background

In the Stack task, AOR achieved a 91% success rate but consistently failed in the remaining cases because the gripper fingers contacted the lower cube (cubeB) during placement, displacing it before release. The agent correctly diagnosed this as the root cause across failure episodes but did not discover a viable placement strategy to avoid contact.

The authors note that strategies such as steeper descent angles, compliant releases from slightly above, or lateral nudge-and-release likely exist but were not found within the explored hypothesis space. This leads to an explicit open question about whether a more capable coding agent or a longer search process would succeed in identifying such a strategy.

References

Whether a sufficiently capable agent or a longer search would find the solution is an open question.

Act-Observe-Rewrite: Multimodal Coding Agents as In-Context Policy Learners for Robot Manipulation  (2603.04466 - Kumar, 3 Mar 2026) in Section 5.2, Experimental Observations by Failure Type — Observed shortcoming: failure to find a working placement strategy