Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Grasp, See and Place: Efficient Unknown Object Rearrangement with Policy Structure Prior (2402.15402v3)

Published 23 Feb 2024 in cs.RO and cs.LG

Abstract: We focus on the task of unknown object rearrangement, where a robot is supposed to re-configure the objects into a desired goal configuration specified by an RGB-D image. Recent works explore unknown object rearrangement systems by incorporating learning-based perception modules. However, they are sensitive to perception error, and pay less attention to task-level performance. In this paper, we aim to develop an effective system for unknown object rearrangement amidst perception noise. We theoretically reveal that the noisy perception impacts grasp and place in a decoupled way, and show such a decoupled structure is valuable to improve task optimality. We propose GSP, a dual-loop system with the decoupled structure as prior. For the inner loop, we learn a see policy for self-confident in-hand object matching. For the outer loop, we learn a grasp policy aware of object matching and grasp capability guided by task-level rewards. We leverage the foundation model CLIP for object matching, policy learning and self-termination. A series of experiments indicate that GSP can conduct unknown object rearrangement with higher completion rates and fewer steps.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. D. Batra, A. X. Chang, S. Chernova, A. J. Davison, J. Deng, V. Koltun, S. Levine, J. Malik, I. Mordatch, R. Mottaghi et al., “Rearrangement: A challenge for embodied ai,” arXiv preprint arXiv:2011.01975, 2020.
  2. A. Krontiris and K. E. Bekris, “Dealing with difficult instances of object rearrangement.” in Robotics: Science and Systems, vol. 1123, 2015.
  3. J. E. King, M. Cognetti, and S. S. Srinivasa, “Rearrangement planning using object-centric and robot-centric action spaces,” in 2016 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2016, pp. 3940–3947.
  4. E. Huang, Z. Jia, and M. T. Mason, “Large-scale multi-object rearrangement,” in 2019 International Conference on Robotics and Automation (ICRA).   IEEE, 2019, pp. 211–218.
  5. H. Song, J. A. Haustein, W. Yuan, K. Hang, M. Y. Wang, D. Kragic, and J. A. Stork, “Multi-object rearrangement with monte carlo tree search: A case study on planar nonprehensile sorting,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2020, pp. 9433–9440.
  6. K. Gao, S. Feng, and J. Yu, “On minimizing the number of running buffers for tabletop rearrangement,” Robotics: Science and Systems XVII, 2021.
  7. K. Gao, D. Lau, B. Huang, K. E. Bekris, and J. Yu, “Fast high-quality tabletop rearrangement in bounded workspace,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 1961–1967.
  8. K. Wada, S. James, and A. J. Davison, “Reorientbot: Learning object reorientation for specific-posed placement,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 8252–8258.
  9. K. Xu, H. Yu, R. Huang, D. Guo, Y. Wang, and R. Xiong, “Efficient object manipulation to an arbitrary goal pose: Learning-based anytime prioritized planning,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 7277–7283.
  10. H. Tian, C. Song, C. Wang, X. Zhang, and J. Pan, “Sampling-based planning for retrieving near-cylindrical objects in cluttered scenes using hierarchical graphs,” IEEE Transactions on Robotics, vol. 39, no. 1, pp. 165–182, 2022.
  11. J. E. King, V. Ranganeni, and S. S. Srinivasa, “Unobservable monte carlo planning for nonprehensile rearrangement tasks,” in 2017 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2017, pp. 4681–4688.
  12. C. Nam, J. Lee, Y. Cho, J. Lee, D. H. Kim, and C. Kim, “Planning for target retrieval using a robotic manipulator in cluttered and occluded environments,” arXiv preprint arXiv:1907.03956, 2019.
  13. C. R. Garrett, C. Paxton, T. Lozano-Pérez, L. P. Kaelbling, and D. Fox, “Online replanning in belief space for partially observable task and motion problems,” in 2020 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 5678–5684.
  14. D. Driess, J.-S. Ha, and M. Toussaint, “Deep visual reasoning: Learning to predict action sequences for task and motion planning from an initial scene image,” in Robotics: Science and Systems 2020 (RSS 2020).   RSS Foundation, 2020.
  15. Z. Liu, W. Liu, Y. Qin, F. Xiang, M. Gou, S. Xin, M. A. Roa, B. Calli, H. Su, Y. Sun et al., “Ocrtoc: A cloud-based competition and benchmark for robotic grasping and manipulation,” IEEE Robotics and Automation Letters, vol. 7, no. 1, pp. 486–493, 2021.
  16. Y. Zhu, J. Tremblay, S. Birchfield, and Y. Zhu, “Hierarchical planning for long-horizon manipulation with geometric and symbolic scene graphs,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021, pp. 6541–6548.
  17. Y. Labbé, S. Zagoruyko, I. Kalevatykh, I. Laptev, J. Carpentier, M. Aubry, and J. Sivic, “Monte-carlo tree search for efficient visually guided rearrangement planning,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 3715–3722, 2020.
  18. Z. Xu, Z. He, J. Wu, and S. Song, “Learning 3d dynamic scene representations for robot manipulation,” in Conference on Robot Learning.   PMLR, 2021, pp. 126–142.
  19. A. Zeng, P. Florence, J. Tompson, S. Welker, J. Chien, M. Attarian, T. Armstrong, I. Krasin, D. Duong, V. Sindhwani et al., “Transporter networks: Rearranging the visual world for robotic manipulation,” in Conference on Robot Learning.   PMLR, 2021, pp. 726–747.
  20. A. H. Qureshi, A. Mousavian, C. Paxton, M. C. Yip, and D. Fox, “Nerp: Neural rearrangement planning for unknown objects,” in Robotics: Science and Systems (RSS), 2020.
  21. A. Goyal, A. Mousavian, C. Paxton, Y.-W. Chao, B. Okorn, J. Deng, and D. Fox, “Ifor: Iterative flow minimization for robotic object rearrangement,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 14 787–14 797.
  22. B. Tang and G. S. Sukhatme, “Selective object rearrangement in clutter,” in Conference on Robot Learning.   PMLR, 2023, pp. 1001–1010.
  23. R. Bajcsy, “Active perception,” Proceedings of the IEEE, vol. 76, no. 8, pp. 966–1005, 1988.
  24. J. Aloimonos, I. Weiss, and A. Bandyopadhyay, “Active vision,” International journal of computer vision, vol. 1, pp. 333–356, 1988.
  25. S. Chen, Y. Li, and N. M. Kwok, “Active vision in robotic systems: A survey of recent developments,” The International Journal of Robotics Research, vol. 30, no. 11, pp. 1343–1377, 2011.
  26. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International Conference on Machine Learning.   PMLR, 2021, pp. 8748–8763.
  27. K. Xu, H. Yu, Q. Lai, Y. Wang, and R. Xiong, “Efficient learning of goal-oriented push-grasping synergy in clutter,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 6337–6344, 2021.
  28. S. Cheng, K. Mo, and L. Shao, “Learning to regrasp by learning to place,” in Conference on Robot Learning.   PMLR, 2022, pp. 277–286.
  29. A. Simeonov, Y. Du, A. Tagliasacchi, J. B. Tenenbaum, A. Rodriguez, P. Agrawal, and V. Sitzmann, “Neural descriptor fields: Se (3)-equivariant object representations for manipulation,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 6394–6400.
  30. A. Simeonov, Y. Du, Y.-C. Lin, A. R. Garcia, L. P. Kaelbling, T. Lozano-Pérez, and P. Agrawal, “Se (3)-equivariant relational rearrangement with neural descriptor fields,” in Conference on Robot Learning.   PMLR, 2023, pp. 835–846.
  31. E. Chun, Y. Du, A. Simeonov, T. Lozano-Perez, and L. Kaelbling, “Local neural descriptor fields: Locally conditioned object representations for manipulation,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 1830–1836.
  32. M. Shridhar, L. Manuelli, and D. Fox, “Cliport: What and where pathways for robotic manipulation,” in Conference on Robot Learning.   PMLR, 2022, pp. 894–906.
  33. W. Liu, C. Paxton, T. Hermans, and D. Fox, “Structformer: Learning spatial structure for language-guided semantic rearrangement of novel objects,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 6322–6329.
  34. W. Liu, T. Hermans, S. Chernova, and C. Paxton, “Structdiffusion: Object-centric diffusion for semantic rearrangement of novel objects,” in Workshop on Language and Robotics at CoRL 2022, 2022.
  35. C. Paxton, C. Xie, T. Hermans, and D. Fox, “Predicting stable configurations for semantic placement of novel objects,” in Conference on Robot Learning.   PMLR, 2022, pp. 806–815.
  36. W. Goodwin, S. Vaze, I. Havoutis, and I. Posner, “Semantically grounded object matching for robust robotic scene rearrangement,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 11 138–11 144.
  37. K. Xu, S. Zhao, Z. Zhou, Z. Li, H. Pi, Y. Zhu, Y. Wang, and R. Xiong, “A joint modeling of vision-language-action for target-oriented grasping in clutter,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 11 597–11 604.
  38. Z. Xu, K. Xu, R. Xiong, and Y. Wang, “Object-centric inference for language conditioned placement: A foundation model based approach,” in 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), 2023, pp. 203–208.
  39. R. Eidenberger and J. Scharinger, “Active perception and scene modeling by planning with probabilistic 6d object poses,” in 2010 IEEE/RSJ international conference on intelligent robots and systems.   IEEE, 2010, pp. 1036–1043.
  40. J. Bohg, K. Hausman, B. Sankaran, O. Brock, D. Kragic, S. Schaal, and G. S. Sukhatme, “Interactive perception: Leveraging action in perception and perception in action,” IEEE Transactions on Robotics, vol. 33, no. 6, pp. 1273–1291, 2017.
  41. X. Ren, J. Luo, E. Solowjow, J. A. Ojea, A. Gupta, A. Tamar, and P. Abbeel, “Domain randomization for active pose estimation,” in 2019 International Conference on Robotics and Automation (ICRA).   IEEE, 2019, pp. 7228–7234.
  42. B. Calli, W. Caarls, M. Wisse, and P. P. Jonker, “Active vision via extremum seeking for robots in unstructured environments: Applications in object recognition and manipulation,” IEEE Transactions on Automation Science and Engineering, vol. 15, no. 4, pp. 1810–1822, 2018.
  43. X. Fu, Y. Liu, and Z. Wang, “Active learning-based grasp for accurate industrial manipulation,” IEEE Transactions on Automation Science and Engineering, vol. 16, no. 4, pp. 1610–1618, 2019.
  44. D. Morrison, P. Corke, and J. Leitner, “Multi-view picking: Next-best-view reaching for improved grasping in clutter,” in 2019 International Conference on Robotics and Automation (ICRA).   IEEE, 2019, pp. 8762–8768.
  45. Y. Sun, M. Liu, and M. Q.-H. Meng, “Active perception for foreground segmentation: An rgb-d data-based background modeling method,” IEEE Transactions on Automation Science and Engineering, vol. 16, no. 4, pp. 1596–1609, 2019.
  46. N. Saito, T. Ogata, S. Funabashi, H. Mori, and S. Sugano, “How to select and use tools?: Active perception of target objects using multimodal deep learning,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2517–2524, 2021.
  47. R. Cheng, A. Agarwal, and K. Fragkiadaki, “Reinforcement learning of active vision for manipulating objects under occlusions,” in Conference on Robot Learning.   PMLR, 2018, pp. 422–431.
  48. R. Jangir, N. Hansen, S. Ghosal, M. Jain, and X. Wang, “Look closer: Bridging egocentric and third-person views with transformers for robotic manipulation,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3046–3053, 2022.
  49. A. Zeng, S. Song, K.-T. Yu, E. Donlon, F. R. Hogan, M. Bauza, D. Ma, O. Taylor, M. Liu, E. Romo et al., “Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching,” The International Journal of Robotics Research, vol. 41, no. 7, pp. 690–705, 2022.
  50. H. Ren and A. H. Qureshi, “Robot active neural sensing and planning in unknown cluttered environments,” IEEE Transactions on Robotics, 2023.
  51. M. Danielczuk, A. Kurenkov, A. Balakrishna, M. Matl, D. Wang, R. Martín-Martín, A. Garg, S. Savarese, and K. Goldberg, “Mechanical search: Multi-step retrieval of a target object occluded by clutter,” in 2019 International Conference on Robotics and Automation (ICRA).   IEEE, 2019, pp. 1614–1621.
  52. T. Novkovic, R. Pautrat, F. Furrer, M. Breyer, R. Siegwart, and J. Nieto, “Object finding in cluttered scenes using interactive perception,” in 2020 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 8338–8344.
  53. A. Das, S. Datta, G. Gkioxari, S. Lee, D. Parikh, and D. Batra, “Embodied question answering,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1–10.
  54. M. Deitke, W. Han, A. Herrasti, A. Kembhavi, E. Kolve, R. Mottaghi, J. Salvador, D. Schwenk, E. VanderBilt, M. Wallingford et al., “Robothor: An open simulation-to-real embodied ai platform,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 3164–3174.
  55. M. Savva, A. Kadian, O. Maksymets, Y. Zhao, E. Wijmans, B. Jain, J. Straub, J. Liu, V. Koltun, J. Malik et al., “Habitat: A platform for embodied ai research,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9339–9347.
  56. M. Shridhar, J. Thomason, D. Gordon, Y. Bisk, W. Han, R. Mottaghi, L. Zettlemoyer, and D. Fox, “Alfred: A benchmark for interpreting grounded instructions for everyday tasks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 10 740–10 749.
  57. F. Xia, A. R. Zamir, Z. He, A. Sax, J. Malik, and S. Savarese, “Gibson env: Real-world perception for embodied agents,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9068–9079.
  58. L. Fan, Y. Zhu, J. Zhu, Z. Liu, O. Zeng, A. Gupta, J. Creus-Costa, S. Savarese, and L. Fei-Fei, “Surreal: Open-source reinforcement learning framework and robot manipulation benchmark,” in Conference on Robot Learning.   PMLR, 2018, pp. 767–782.
  59. S. James, Z. Ma, D. R. Arrojo, and A. J. Davison, “Rlbench: The robot learning benchmark & learning environment,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 3019–3026, 2020.
  60. T. Yu, D. Quillen, Z. He, R. Julian, K. Hausman, C. Finn, and S. Levine, “Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning,” in Conference on robot learning.   PMLR, 2020, pp. 1094–1100.
  61. K. Ehsani, W. Han, A. Herrasti, E. VanderBilt, L. Weihs, E. Kolve, A. Kembhavi, and R. Mottaghi, “Manipulathor: A framework for visual object manipulation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 4497–4506.
  62. S. D. Han, N. M. Stiffler, A. Krontiris, K. E. Bekris, and J. Yu, “Complexity results and fast methods for optimal tabletop rearrangement with overhand grasps,” The International Journal of Robotics Research, vol. 37, no. 13-14, pp. 1775–1795, 2018.
  63. Z. Zhou, Y. Yang, Y. Wang, and R. Xiong, “Open-set object detection using classification-free object proposal and instance-level contrastive learning,” IEEE Robotics and Automation Letters, vol. 8, no. 3, pp. 1691–1698, 2023.
  64. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  65. P. Gao, S. Geng, R. Zhang, T. Ma, R. Fang, Y. Zhang, H. Li, and Y. Qiao, “Clip-adapter: Better vision-language models with feature adapters,” arXiv preprint arXiv:2110.04544, 2021.
  66. Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16.   Springer, 2020, pp. 402–419.
  67. E. Coumans and Y. Bai, “Pybullet, a python module for physics simulation for games, robotics and machine learning,” http://pybullet.org, 2016–2021.
  68. H.-S. Fang, C. Wang, M. Gou, and C. Lu, “Graspnet-1billion: A large-scale benchmark for general object grasping,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 444–11 453.
  69. T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel et al., “Soft actor-critic algorithms and applications,” arXiv preprint arXiv:1812.05905, 2018.
  70. P. Christodoulou, “Soft actor-critic for discrete action settings,” arXiv preprint arXiv:1910.07207, 2019.
  71. B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I, 2020, pp. 405–421.
  72. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  73. P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 4938–4947.
  74. J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “Loftr: Detector-free local feature matching with transformers,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8922–8931.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com