Working Backwards: Learning to Place by Picking (2312.02352v4)

Published 4 Dec 2023 in cs.RO, cs.AI, and cs.LG

Abstract: We present placing via picking (PvP), a method to autonomously collect real-world demonstrations for a family of placing tasks in which objects must be manipulated to specific, contact-constrained locations. With PvP, we approach the collection of robotic object placement demonstrations by reversing the grasping process and exploiting the inherent symmetry of the pick and place problems. Specifically, we obtain placing demonstrations from a set of grasp sequences of objects initially located at their target placement locations. Our system can collect hundreds of demonstrations in contact-constrained environments without human intervention using two modules: compliant control for grasping and tactile regrasping. We train a policy directly from visual observations through behavioural cloning, using the autonomously-collected demonstrations. By doing so, the policy can generalize to object placement scenarios outside of the training environment without privileged information (e.g., placing a plate picked up from a table). We validate our approach in home robot scenarios that include dishwasher loading and table setting. Our approach yields robotic placing policies that outperform policies trained with kinesthetic teaching, both in terms of success rate and data efficiency, while requiring no human supervision.

References (42)

Summary

The paper's main contribution is the development of LPP, a self-supervised framework that learns placement by reversing the picking process.
It employs tactile sensing, compliant control, and noise augmentation to enhance data quality and improve robotic manipulation.
Experimental results show superior performance in tasks like dish-loading and table setting compared to traditional kinesthetic teaching.

In the field of autonomous robotics, the ability to precisely place objects is a critical skill, particularly for tasks such as setting a table or loading a dishwasher—activities that necessitate careful handling and placement of items. Achieving this capability in a robot requires addressing a multitude of challenges like object tracking, scene understanding, motion planning, and control. Traditionally, tackling these tasks has involved significant human input through the process of imitation learning (IL), where robots are taught by observing expert human demonstrations. However, one of the major bottlenecks has been the extensive time and effort required for humans to produce these necessary demonstrations.

A novel strategy aimed at overcoming these hurdles is Learning to Place by Picking (LPP). This approach essentially works by reversing the process of picking up objects. In typical scenarios, robots learn to grasp objects from various locations and place them into their designated spots. LPP cleverly inverts this task—assuming the objects begin in their target spots, the robot learns to pick up these objects and then uses this knowledge to infer how to place them back, essentially teaching itself the correct placement.

The significant innovation in LPP is its self-supervised pipeline, which autonomously generates placement demonstration data by cleverly exploiting the reciprocal nature of the pick-and-place problem. This is achieved by recording the robot's movements as it picks up objects from their target locations and then using the reverse of these movements as placement demonstrations. Hence, the robot learns by picking up objects from the exact location where it needs to place them and records this action to understand how to do the reverse.

A key aspect of LPP’s effectiveness is its ability to collect autonomous demonstrations without human intervention, made possible through tactile sensing and compliant control during grasping. By being sensitive to touch and applying compliant manipulator control, the robot system can handle objects gently, avoid applying excessive force, and ensure stable grasping—key factors in achieving reliable and uninterrupted data collection. Furthermore, LPP involves noise augmentation during data collection to enhance the robustness of learned policies, simulating a range of real-world conditions that the robot might encounter outside the training environment.

When LPP was put to the test, the resulting placing policies could proficiently perform complex tasks such as loading dishes into a dishwasher and setting items onto a table. The efficiency and performance of these policies were superior to those trained using traditional methods like kinesthetic teaching, where a human expert guides the robot's movements. The data collected by LPP, notably without the need for expert human demonstrations, proved to be of a higher quality, leading to more successful task achievement by the robot.

In conclusion, LPP represents a promising advancement in robotic object placement, providing a method of data collection that is both autonomous and efficient, presenting minimal demands on human labor. The implications of such advances are substantial, paving the way for robots to adeptly carry out a multitude of tasks in domestic and industrial settings alike. As the technology continues to evolve, we may soon reach a new era of autonomous robots capable of handling objects with the skill and delicacy of a human hand, but with the consistency and tirelessness of a machine.

PDF Markdown

Tweets

https://twitter.com/OWW/status/1811125100743164100

Working Backwards: Learning to Place by Picking (2312.02352v4)

Summary

Related Papers

Tweets