Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Working Backwards: Learning to Place by Picking (2312.02352v4)

Published 4 Dec 2023 in cs.RO, cs.AI, and cs.LG

Abstract: We present placing via picking (PvP), a method to autonomously collect real-world demonstrations for a family of placing tasks in which objects must be manipulated to specific, contact-constrained locations. With PvP, we approach the collection of robotic object placement demonstrations by reversing the grasping process and exploiting the inherent symmetry of the pick and place problems. Specifically, we obtain placing demonstrations from a set of grasp sequences of objects initially located at their target placement locations. Our system can collect hundreds of demonstrations in contact-constrained environments without human intervention using two modules: compliant control for grasping and tactile regrasping. We train a policy directly from visual observations through behavioural cloning, using the autonomously-collected demonstrations. By doing so, the policy can generalize to object placement scenarios outside of the training environment without privileged information (e.g., placing a plate picked up from a table). We validate our approach in home robot scenarios that include dishwasher loading and table setting. Our approach yields robotic placing policies that outperform policies trained with kinesthetic teaching, both in terms of success rate and data efficiency, while requiring no human supervision.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. T. Ablett, Y. Zhai, and J. Kelly, “Seeing all the angles: Learning multiview manipulation policies for contact-rich tasks from demonstrations,” pp. 7843–7850, IEEE, 2021.
  2. P. Florence, C. Lynch, A. Zeng, O. A. Ramirez, A. Wahid, L. Downs, A. Wong, J. Lee, I. Mordatch, and J. Tompson, “Implicit behavioral cloning,” in Conference on Robot Learning, pp. 158–168, PMLR, 2022.
  3. E. Jang, A. Irpan, M. Khansari, D. Kappler, F. Ebert, C. Lynch, S. Levine, and C. Finn, “Bc-z: Zero-shot task generalization with robotic imitation learning,” pp. 991–1002, PMLR, 2022.
  4. C. Lynch, A. Wahid, J. Tompson, T. Ding, J. Betker, R. Baruch, T. Armstrong, and P. Florence, “Interactive language: Talking to robots in real time,” arXiv preprint arXiv:2210.06407, 2022.
  5. M. Sundermeyer, A. Mousavian, R. Triebel, and D. Fox, “Contact-graspnet: Efficient 6-dof grasp generation in cluttered scenes,” pp. 13438–13444, IEEE, 2021.
  6. A. Bicchi and V. Kumar, “Robotic grasping and contact: A review,” vol. 1, pp. 348–353, IEEE, 2000.
  7. J. Bohg, A. Morales, T. Asfour, and D. Kragic, “Data-driven grasp synthesis—a survey,” IEEE Transactions on robotics, vol. 30, no. 2, pp. 289–309, 2013.
  8. A. Zeng, S. Song, K.-T. Yu, E. Donlon, F. R. Hogan, M. Bauza, D. Ma, O. Taylor, M. Liu, E. Romo, et al., “Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching,” vol. 41, no. 7, pp. 690–705, 2022.
  9. A. Mousavian, C. Eppner, and D. Fox, “6-dof graspnet: Variational grasp generation for object manipulation,” pp. 2901–2910, 2019.
  10. J. A. Haustein, K. Hang, J. Stork, and D. Kragic, “Object placement planning and optimization for robot manipulators,” pp. 7417–7424, IEEE, 2019.
  11. J. A. Haustein, S. Cruciani, R. Asif, K. Hang, and D. Kragic, “Placing objects with prior in-hand manipulation using dexterous manipulation graphs,” in 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids), pp. 453–460, IEEE, 2019.
  12. K. Harada, T. Tsuji, K. Nagata, N. Yamanobe, and H. Onda, “Validating an object placement planner for robotic pick-and-place tasks,” vol. 62, no. 10, pp. 1463–1477, 2014.
  13. Y. Jiang, M. Lim, C. Zheng, and A. Saxena, “Learning to place new objects in a scene,” vol. 31, no. 9, pp. 1021–1043, 2012.
  14. M. J. Schuster, J. Okerman, H. Nguyen, J. M. Rehg, and C. C. Kemp, “Perceiving clutter and surfaces for object placement in indoor environments,” pp. 152–159, IEEE, 2010.
  15. S. Dong, D. K. Jha, D. Romeres, S. Kim, D. Nikovski, and A. Rodriguez, “Tactile-rl for insertion: Generalization to objects of unknown geometry,” pp. 6437–6443, 2021.
  16. C. Finn, S. Levine, and P. Abbeel, “Guided cost learning: Deep inverse optimal control via policy optimization,” pp. 49–58, PMLR, 2016.
  17. Y. Liu, D. Romeres, D. K. Jha, and D. Nikovski, “Understanding multi-modal perception using behavioral cloning for peg-in-a-hole insertion tasks,” arXiv preprint arXiv:2007.11646, 2020.
  18. J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” pp. 2256–2265, PMLR, 2015.
  19. J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” vol. 33, pp. 6840–6851, 2020.
  20. S. Nair, M. Babaeizadeh, C. Finn, S. Levine, and V. Kumar, “Trass: Time reversal as self-supervision,” pp. 115–121, IEEE, 2020.
  21. O. Limoyo, B. Chan, F. Marić, B. Wagstaff, A. R. Mahmood, and J. Kelly, “Heteroscedastic uncertainty for robust generative latent dynamics,” IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 6654–6661, 2020.
  22. O. Limoyo, T. Ablett, and J. Kelly, “Learning sequential latent variable models from multimodal time series data,” in Intelligent Autonomous Systems 17 (I. Petrovic, E. Menegatti, and I. Marković, eds.), (Cham), pp. 511–528, Springer Nature Switzerland, 2023.
  23. L. Fu, H. Huang, L. Berscheid, H. Li, K. Goldberg, and S. Chitta, “Safely learning visuo-tactile feedback policies in real for industrial insertion,” arXiv preprint arXiv:2210.01340, 2022.
  24. O. Spector and D. Di Castro, “Insertionnet-a scalable solution for insertion,” vol. 6, no. 3, pp. 5509–5516, 2021.
  25. O. Spector, V. Tchuiev, and D. Di Castro, “Insertionnet 2.0: Minimal contact multi-step insertion using multimodal multiview sensory input,” arXiv preprint arXiv:2203.01153, 2022.
  26. K. Zakka, A. Zeng, J. Lee, and S. Song, “Form2fit: Learning shape priors for generalizable assembly from disassembly,” pp. 9404–9410, IEEE, 2020.
  27. S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, C. Li, J. Yang, H. Su, J. Zhu, et al., “Grounding dino: Marrying dino with grounded pre-training for open-set object detection,” arXiv preprint arXiv:2303.05499, 2023.
  28. A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment anything,” arXiv:2304.02643, 2023.
  29. P. Florence, L. Manuelli, and R. Tedrake, “Self-supervised correspondence in visuomotor policy learning,” vol. 5, no. 2, pp. 492–499, 2019.
  30. S. Sinha, A. Mandlekar, and A. Garg, “S4rl: Surprisingly simple self-supervision for offline reinforcement learning in robotics,” pp. 907–917, PMLR, 2022.
  31. M. Laskey, J. Lee, R. Fox, A. Dragan, and K. Goldberg, “Dart: Noise injection for robust imitation learning,” pp. 143–156, PMLR, 2017.
  32. D. Pomerleau, “Alvinn: An autonomous land vehicle in a neural network,” pp. 305 – 313, Morgan Kaufmann, December 1989.
  33. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” pp. 770–778, 2016.
  34. Z. Wang, A. Novikov, K. Zolna, J. S. Merel, J. T. Springenberg, S. E. Reed, B. Shahriari, N. Siegel, C. Gulcehre, N. Heess, et al., “Critic regularized regression,” vol. 33, pp. 7768–7778, 2020.
  35. A. Mandlekar, D. Xu, J. Wong, S. Nasiriany, C. Wang, R. Kulkarni, L. Fei-Fei, S. Savarese, Y. Zhu, and R. Martín-Martín, “What matters in learning from offline human demonstrations for robot manipulation,” arXiv preprint arXiv:2108.03298, 2021.
  36. Y. Zhu, A. Joshi, P. Stone, and Y. Zhu, “Viola: Imitation learning for vision-based manipulation with object proposal priors,” arXiv preprint arXiv:2210.11339, 2022.
  37. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, 2013.
  38. C. M. Bishop, “Mixture density networks,” 1994.
  39. A. G. Billard, S. Calinon, and R. Dillmann, “Learning from Humans,” in Springer Handbook of Robotics (B. Siciliano and O. Khatib, eds.), pp. 1995–2014, Cham: Springer International Publishing, 2016.
  40. I. Kostrikov, K. K. Agrawal, D. Dwibedi, S. Levine, and J. Tompson, “Discriminator-actor-critic: Addressing sample inefficiency and reward bias in adversarial imitation learning,” 2019.
  41. M. Orsini, A. Raichuk, L. Hussenot, D. Vincent, R. Dadashi, S. Girgin, M. Geist, O. Bachem, O. Pietquin, and M. Andrychowicz, “What matters for adversarial imitation learning?,” vol. 34, pp. 14656–14668, 2021.
  42. A. Mandlekar, D. Xu, J. Wong, S. Nasiriany, C. Wang, R. Kulkarni, L. Fei-Fei, S. Savarese, Y. Zhu, and R. Mart’in-Mart’in, “What matters in learning from offline human demonstrations for robot manipulation,” 2021.

Summary

  • The paper's main contribution is the development of LPP, a self-supervised framework that learns placement by reversing the picking process.
  • It employs tactile sensing, compliant control, and noise augmentation to enhance data quality and improve robotic manipulation.
  • Experimental results show superior performance in tasks like dish-loading and table setting compared to traditional kinesthetic teaching.

In the field of autonomous robotics, the ability to precisely place objects is a critical skill, particularly for tasks such as setting a table or loading a dishwasher—activities that necessitate careful handling and placement of items. Achieving this capability in a robot requires addressing a multitude of challenges like object tracking, scene understanding, motion planning, and control. Traditionally, tackling these tasks has involved significant human input through the process of imitation learning (IL), where robots are taught by observing expert human demonstrations. However, one of the major bottlenecks has been the extensive time and effort required for humans to produce these necessary demonstrations.

A novel strategy aimed at overcoming these hurdles is Learning to Place by Picking (LPP). This approach essentially works by reversing the process of picking up objects. In typical scenarios, robots learn to grasp objects from various locations and place them into their designated spots. LPP cleverly inverts this task—assuming the objects begin in their target spots, the robot learns to pick up these objects and then uses this knowledge to infer how to place them back, essentially teaching itself the correct placement.

The significant innovation in LPP is its self-supervised pipeline, which autonomously generates placement demonstration data by cleverly exploiting the reciprocal nature of the pick-and-place problem. This is achieved by recording the robot's movements as it picks up objects from their target locations and then using the reverse of these movements as placement demonstrations. Hence, the robot learns by picking up objects from the exact location where it needs to place them and records this action to understand how to do the reverse.

A key aspect of LPP’s effectiveness is its ability to collect autonomous demonstrations without human intervention, made possible through tactile sensing and compliant control during grasping. By being sensitive to touch and applying compliant manipulator control, the robot system can handle objects gently, avoid applying excessive force, and ensure stable grasping—key factors in achieving reliable and uninterrupted data collection. Furthermore, LPP involves noise augmentation during data collection to enhance the robustness of learned policies, simulating a range of real-world conditions that the robot might encounter outside the training environment.

When LPP was put to the test, the resulting placing policies could proficiently perform complex tasks such as loading dishes into a dishwasher and setting items onto a table. The efficiency and performance of these policies were superior to those trained using traditional methods like kinesthetic teaching, where a human expert guides the robot's movements. The data collected by LPP, notably without the need for expert human demonstrations, proved to be of a higher quality, leading to more successful task achievement by the robot.

In conclusion, LPP represents a promising advancement in robotic object placement, providing a method of data collection that is both autonomous and efficient, presenting minimal demands on human labor. The implications of such advances are substantial, paving the way for robots to adeptly carry out a multitude of tasks in domestic and industrial settings alike. As the technology continues to evolve, we may soon reach a new era of autonomous robots capable of handling objects with the skill and delicacy of a human hand, but with the consistency and tirelessness of a machine.

X Twitter Logo Streamline Icon: https://streamlinehq.com