SGAligner : 3D Scene Alignment with Scene Graphs (2304.14880v2)
Abstract: Building 3D scene graphs has recently emerged as a topic in scene representation for several embodied AI applications to represent the world in a structured and rich manner. With their increased use in solving downstream tasks (eg, navigation and room rearrangement), can we leverage and recycle them for creating 3D maps of environments, a pivotal step in agent operation? We focus on the fundamental problem of aligning pairs of 3D scene graphs whose overlap can range from zero to partial and can contain arbitrary changes. We propose SGAligner, the first method for aligning pairs of 3D scene graphs that is robust to in-the-wild scenarios (ie, unknown overlap -- if any -- and changes in the environment). We get inspired by multi-modality knowledge graphs and use contrastive learning to learn a joint, multi-modal embedding space. We evaluate on the 3RScan dataset and further showcase that our method can be used for estimating the transformation between pairs of 3D scenes. Since benchmarks for these tasks are missing, we create them on this dataset. The code, benchmark, and trained models are available on the project website.
- Taskography: Evaluating robot task planning over large 3d scene graphs. In Conference on Robot Learning, pages 46–58. PMLR, 2022.
- 3d scene graph: A structure for unified semantics, 3d space, and camera. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5664–5673, 2019.
- Pointnet on fpga for real-time lidar point cloud processing. 10 2020.
- D3feat: Joint learning of dense detection and description of 3d local features. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6359–6367, 2020.
- Graph-cut ransac. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6733–6741, 2018.
- Magsac++, a fast, reliable and accurate robust estimator. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1304–1312, 2020.
- D-lite: Navigation-oriented compression of 3d scene graphs under communication constraints. arXiv preprint arXiv:2209.06111, 2022.
- Mmea: entity alignment for multi-modal knowledge graph. In Knowledge Science, Engineering and Management: 13th International Conference, KSEM 2020, Hangzhou, China, August 28–30, 2020, Proceedings, Part I 13, pages 134–147. Springer, 2020.
- Multi-modal siamese network for entity alignment. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 118–126, 2022.
- Multijaf: Multi-modal joint entity alignment framework for multi-modal knowledge graph. Neurocomputing, 500:581–591, 2022.
- Graph-to-3d: End-to-end generation and manipulation of 3d scenes using scene graphs. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16352–16361, 2021.
- Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communication of ACM, 1981.
- Continuous scene representations for embodied ai. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14849–14859, June 2022.
- Multi-modal entity alignment in hyperbolic space. Neurocomputing, 461:598–607, 2021.
- Pct: Point cloud transformer. Computational Visual Media, 7(2):187–199, Apr 2021.
- Predator: Registration of 3d point clouds with low overlap. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021.
- Hydra: A real-time spatial perception system for 3D scene graph construction and optimization. 2022.
- Aggregating local descriptors into a compact image representation. In 2010 IEEE computer society conference on computer vision and pattern recognition, pages 3304–3311. IEEE, 2010.
- Sequential manipulation planning on scene graph. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 8203–8210. IEEE, 2022.
- 3-d scene graph: A sparse and semantic representation of physical environments for intelligent agents. IEEE transactions on cybernetics, 50(12):4921–4933, 2019.
- Visual genome: Connecting language and vision using crowdsourced dense image annotations. International journal of computer vision, 123:32–73, 2017.
- Embodied semantic scene graph generation. In Conference on Robot Learning, pages 1585–1594. PMLR, 2022.
- Graph matching networks for learning the similarity of graph structured objects. In International conference on machine learning, pages 3835–3845. PMLR, 2019.
- Remote object navigation for service robots using hierarchical knowledge graph in human-centered environments. Intelligent Service Robotics, 15(4):459–473, 2022.
- Multi-modal contrastive representation learning for entity alignment. arXiv preprint arXiv:2209.00891, 2022.
- Visual pivoting for (unsupervised) entity alignment. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 4257–4266, 2021.
- Hal: Improved text-image matching by mitigating visual semantic hubs. Proceedings of the AAAI Conference on Artificial Intelligence, 34:11563–11571, 04 2020.
- Mmkg: multi-modal knowledge graphs. In The Semantic Web: 16th International Conference, ESWC 2019, Portorož, Slovenia, June 2–6, 2019, Proceedings 16, pages 459–474. Springer, 2019.
- 3d vsg: Long-term semantic scene change prediction through 3d variable scene graphs. arXiv preprint arXiv:2209.07896, 2022.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2019.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017.
- Geometric transformer for fast and robust point cloud registration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022.
- Usac: A universal framework for random sample consensus. IEEE transactions on pattern analysis and machine intelligence, 35(8):2022–2038, 2012.
- Bridging scene understanding and task execution with flexible simulation environments. arXiv preprint arXiv:2011.10452, 2020.
- Hierarchical representations and explicit memory: Learning effective navigation policies on 3d scene graphs using graph neural networks. In 2022 International Conference on Robotics and Automation (ICRA), pages 9272–9279, 2022.
- 3d dynamic scene graphs: Actionable spatial perception with places, objects, and humans. arXiv preprint arXiv:2002.06289, 2020.
- Kimera: From slam to spatial perception with 3d dynamic scene graphs. The International Journal of Robotics Research, 40(12-14):1510–1546, 2021.
- Fast point feature histograms (fpfh) for 3d registration. In 2009 IEEE international conference on robotics and automation, pages 3212–3217. IEEE, 2009.
- A deep learning based behavioral approach to indoor autonomous navigation. In 2018 IEEE international conference on robotics and automation (ICRA), pages 4646–4653. IEEE, 2018.
- Probing the impacts of visual context in multimodal entity alignment. In Web and Big Data: 6th International Joint Conference, APWeb-WAIM 2022, Nanjing, China, November 25–27, 2022, Proceedings, Part II, pages 255–270. Springer, 2023.
- NeuralRecon: Real-time coherent 3D reconstruction from monocular video. CVPR, 2021.
- Graph attention networks. In International Conference on Learning Representations, 2018.
- Rio: 3d object instance re-localization in changing indoor environments. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7658–7667, 2019.
- Learning 3d semantic scene graphs from 3d indoor reconstructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3961–3970, 2020.
- Scenegraphfusion: Incremental 3d scene graph prediction from rgb-d sequences. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7515–7525, 2021.
- Rpm-net: Robust point matching using learned features. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11824–11833, 2020.
- Regtr: End-to-end point cloud correspondences with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022.
- 3dmatch: Learning local geometric descriptors from rgb-d reconstructions. In CVPR, 2017.
- Exploiting edge-oriented reasoning for 3d point-based scene graph analysis. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9705–9715, June 2021.
- Multi-view knowledge graph embedding for entity alignment. arXiv preprint arXiv:1906.02390, 2019.
- A dual representation framework for robot learning with human guidance. In 6th Annual Conference on Robot Learning.
- Knowledge-inspired 3d scene graph prediction in point cloud. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 18620–18632, 2021.
- Yu Zhong. Intrinsic shape signatures: A shape descriptor for 3d object recognition. In 2009 IEEE 12th international conference on computer vision workshops, ICCV workshops, pages 689–696. IEEE, 2009.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.