Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement (2307.04751v1)
Abstract: We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship, such as a book inserted in an open slot of a bookshelf. The pipeline generalizes to novel geometries, poses, and layouts of both scenes and objects, and is trained from demonstrations to operate directly on 3D point clouds. Our system overcomes challenges associated with the existence of many geometrically-similar rearrangement solutions for a given scene. By leveraging an iterative pose de-noising training procedure, we can fit multi-modal demonstration data and produce multi-modal outputs while remaining precise and accurate. We also show the advantages of conditioning on relevant local geometric features while ignoring irrelevant global structure that harms both generalization and precision. We demonstrate our approach on three distinct rearrangement tasks that require handling multi-modality and generalization over object shape and pose in both simulation and the real world. Project website, code, and videos: https://anthonysimeonov.github.io/rpdiff-multi-modal/
- Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation. In Conference on Robot Learning. PMLR, 2023.
- Rearrangement: A challenge for embodied ai. arXiv preprint arXiv:2011.01975, 2020.
- Neural shape mating: Self-supervised object assembly with adversarial shape priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12724–12733, 2022.
- Denoising diffusion probabilistic models. arXiv preprint arxiv:2006.11239, 2020.
- Y. Song and S. Ermon. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
- Neural fields in visual computing and beyond. Computer Graphics Forum, 2022. ISSN 1467-8659. doi:10.1111/cgf.14505.
- Convolutional occupancy networks. In Proc. ECCV, 2020.
- Coarse-to-fine q-attention: Efficient learning for visual robotic manipulation via discretisation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13739–13748, 2022.
- S. James and A. J. Davison. Q-attention: Enabling efficient learning for vision-based robotic manipulation. IEEE Robotics and Automation Letters, 7(2):1612–1619, 2022.
- Contact-graspnet: Efficient 6-dof grasp generation in cluttered scenes. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13438–13444. IEEE, 2021.
- M. Welling and Y. W. Teh. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), 2011.
- J. J. Kuffner. Effective sampling and distance metrics for 3d rigid body path planning. In IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA’04. 2004, volume 4, pages 3993–3998. IEEE, 2004.
- D. Q. Huynh. Metrics for 3d rotations: Comparison and analysis. Journal of Mathematical Imaging and Vision, 35(2):155–164, 2009.
- Normalization techniques in training dnns: Methodology, analysis and application. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, 2017.
- Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics (tog), 38(5):1–12, 2019.
- On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5745–5753, 2019.
- E. Coumans and Y. Bai. Pybullet, a python module for physics simulation for games, robotics and machine learning. GitHub repository, 2016.
- Se(3)-equivariant relational rearrangement with neural descriptor fields. In Conference on Robot Learning (CoRL). PMLR, 2022.
- Learning structured output representation using deep conditional generative models. In Advances in neural information processing systems, 2015.
- kpam: Keypoint affordances for category-level robotic manipulation. In The International Symposium of Robotics Research, pages 132–157. Springer, 2019.
- Learning to regrasp by learning to place. In 5th Annual Conference on Robot Learning, 2021. URL https://openreview.net/forum?id=Qdb1ODTQTnL.
- Stable object reorientation using contact plane registration. In 2022 International Conference on Robotics and Automation (ICRA), 2022.
- Shape-based transfer of generic skills. In 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021.
- A long horizon planning framework for manipulating rigid pointcloud objects. In Conference on Robot Learning (CoRL), 2020.
- Online object model reconstruction and reuse for lifelong improvement of robot manipulation. In 2022 International Conference on Robotics and Automation (ICRA), pages 1540–1546. IEEE, 2022.
- M. Gualtieri and R. Platt. Robotic pick-and-place with uncertain object instance segmentation and shape completion. IEEE robotics and automation letters, 6(2):1753–1760, 2021.
- Self-supervised correspondence in visuomotor policy learning. IEEE Robotics and Automation Letters, 2019.
- Long-horizon manipulation of unknown objects via task and motion planning with estimated affordances. In 2022 International Conference on Robotics and Automation (ICRA), pages 1940–1946. IEEE, 2022.
- Predicting stable configurations for semantic placement of novel objects. In Conference on Robot Learning, pages 806–815. PMLR, 2022.
- Sornet: Spatial object-centric representations for sequential manipulation. In 5th Annual Conference on Robot Learning. PMLR, 2021.
- Ifor: Iterative flow minimization for robotic object rearrangement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14787–14797, 2022.
- NeRP: Neural Rearrangement Planning for Unknown Objects. In Proceedings of Robotics: Science and Systems, July 2021.
- Learning to solve sequential physical reasoning problems from a scene image. The International Journal of Robotics Research, 2021.
- Deep visual reasoning: Learning to predict action sequences for task and motion planning from an initial scene image. In Robotics: Science and Systems 2020 (RSS 2020). RSS Foundation, 2020.
- Structformer: Learning spatial structure for language-guided semantic rearrangement of novel objects. In 2022 International Conference on Robotics and Automation (ICRA), pages 6322–6329. IEEE, 2022.
- Semantically grounded object matching for robust robotic scene rearrangement. In 2022 International Conference on Robotics and Automation (ICRA), pages 11138–11144. IEEE, 2022.
- Object rearrangement using learned implicit collision functions. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 6010–6017. IEEE, 2021.
- Tax-pose: Task-specific cross-pose estimation for robot manipulation. In 6th Annual Conference on Robot Learning.
- Structdiffusion: Object-centric diffusion for semantic rearrangement of novel objects. arXiv preprint arXiv:2211.04604, 2022.
- MIRA: Mental imagery for robotic affordances. In Conference on Robot Learning (CoRL), 2022.
- ReorientBot: Learning object reorientation for specific-posed placement. In IEEE International Conference on Robotics and Automation (ICRA), 2022.
- Transporter networks: Rearranging the visual world for robotic manipulation. Conference on Robot Learning (CoRL), 2020.
- Equivariant Transporter Network. In Proceedings of Robotics: Science and Systems, New York City, NY, USA, June 2022.
- O2O-Afford: Annotation-free large-scale object-object affordance learning. In Conference on Robot Learning (CoRL), 2021.
- Perceiver-actor: A multi-task transformer for robotic manipulation. In Proceedings of the 6th Conference on Robot Learning (CoRL), 2022.
- Neural descriptor fields: Se (3)-equivariant object representations for manipulation. In 2022 International Conference on Robotics and Automation (ICRA), 2022.
- Behavior transformers: Cloning k𝑘kitalic_k modes with one stone. In Thirty-Sixth Conference on Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=agTr-vRQsa.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265. PMLR, 2015.
- Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761, 2020.
- Wavegrad: Estimating gradients for waveform generation. arXiv preprint arXiv:2009.00713, 2020.
- S. Luo and W. Hu. Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021.
- Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751, 2022.
- Planning with diffusion for flexible behavior synthesis. In International Conference on Machine Learning, 2022.
- Se(3)-diffusionfields: Learning smooth cost functions for joint grasp and motion optimization through diffusion. IEEE International Conference on Robotics and Automation (ICRA), 2023.
- Lego-net: Learning regular rearrangements of objects in rooms. arXiv preprint arXiv:2301.09629, 2023.
- Diffusion policy: Visuomotor policy learning via action diffusion, 2023.
- Implicit behavioral cloning. Conference on Robot Learning (CoRL), 2021.
- Planning with diffusion for flexible behavior synthesis, 2022.
- Is conditional generative modeling all you need for decision-making?, 2022.
- Diffusion-based generation, optimization, and planning in 3d scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16750–16761, June 2023.
- Model based planning with energy based models, 2021.
- Human pose estimation with iterative error feedback. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4733–4742, 2016.
- Deepim: Deep iterative matching for 6d pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 683–698, 2018.
- Cosypose: Consistent multi-view multi-object 6d pose estimation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVII 16, pages 574–591. Springer, 2020.
- Megapose: 6d pose estimation of novel objects via render & compare. arXiv preprint arXiv:2212.06870, 2022.
- M. Delbracio and P. Milanfar. Inversion by direct iteration: An alternative to denoising diffusion for image restoration. arXiv preprint arXiv:2303.11435, 2023.
- Cold diffusion: Inverting arbitrary image transforms without noise. arXiv preprint arXiv:2208.09392, 2022.
- Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
- Elucidating the design space of diffusion-based generative models. arXiv preprint arXiv:2206.00364, 2022.
- T. Chen. On the importance of noise scheduling for diffusion models. arXiv preprint arXiv:2301.10972, 2023.
- I. Loshchilov and F. Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- A micro lie theory for state estimation in robotics. arXiv preprint arXiv:1812.01537, 2018.
- AIRobot. https://github.com/Improbable-AI/airobot, 2019.
- A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd, volume 96, pages 226–231, 1996.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017.
- Implicit-pdf: Non-parametric representation of probability distributions on the rotation manifold. arXiv preprint arXiv:2106.05965, 2021.
- Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching. In 2018 IEEE international conference on robotics and automation, pages 1–8. IEEE, 2018.
- Vector neurons: A general framework for so(3)-equivariant networks. In ICCV, 2021.
- Motion policy networks. In Conference on Robot Learning, pages 967–977. PMLR, 2023.
- Anthony Simeonov (14 papers)
- Ankit Goyal (21 papers)
- Lucas Manuelli (10 papers)
- Lin Yen-Chen (12 papers)
- Alina Sarmiento (1 paper)
- Alberto Rodriguez (79 papers)
- Pulkit Agrawal (103 papers)
- Dieter Fox (201 papers)