3D VSG: Long-term Semantic Scene Change Prediction through 3D Variable Scene Graphs (2209.07896v2)
Abstract: Numerous applications require robots to operate in environments shared with other agents, such as humans or other robots. However, such shared scenes are typically subject to different kinds of long-term semantic scene changes. The ability to model and predict such changes is thus crucial for robot autonomy. In this work, we formalize the task of semantic scene variability estimation and identify three main varieties of semantic scene change: changes in the position of an object, its semantic state, or the composition of a scene as a whole. To represent this variability, we propose the Variable Scene Graph (VSG), which augments existing 3D Scene Graph (SG) representations with the variability attribute, representing the likelihood of discrete long-term change events. We present a novel method, DeltaVSG, to estimate the variability of VSGs in a supervised fashion. We evaluate our method on the 3RScan long-term dataset, showing notable improvements in this novel task over existing approaches. Our method DeltaVSG achieves an accuracy of 77.1% and a recall of 72.3%, often mimicking human intuition about how indoor scenes change over time. We further show the utility of VSG prediction in the task of active robotic change detection, speeding up task completion by 66.0% compared to a scene-change-unaware planner. We make our code available as open-source.
- J. McCormac, A. Handa, A. Davison, and S. Leutenegger, “Semanticfusion: Dense 3d semantic mapping with convolutional neural networks,” in IEEE Int. Conf. on Robotics & Automation, 2017, pp. 4628–4635.
- A. Rosinol, M. Abate, Y. Chang, and L. Carlone, “Kimera: An Open-Source Library for Real-Time Metric-Semantic Localization and Mapping,” in IEEE Int. Conf. on Robotics & Automation, May 2020, pp. 1689–1696.
- Y. Jiang, X. Ma, F. Fang, and X. Kang, “Indoor instance-aware semantic mapping using instance segmentation,” in 2021 33rd Chinese Control and Decision Conference (CCDC). IEEE, 2021, pp. 3549–3554.
- M. Grinvald, F. Furrer, T. Novkovic, J. J. Chung, C. Cadena, R. Siegwart, and J. Nieto, “Volumetric instance-aware semantic mapping and 3d object discovery,” IEEE Robotics and Automation Letters, vol. 4, no. 3, pp. 3037–3044, 2019.
- L. Schmid, J. Delmerico, J. L. Schönberger, J. Nieto, M. Pollefeys, R. Siegwart, and C. Cadena, “Panoptic Multi-TSDFs: a Flexible Representation for Online Multi-resolution Volumetric Mapping and Long-term Dynamic Scene Consistency,” in 2022 International Conference on Robotics and Automation (ICRA), May 2022, pp. 8018–8024.
- I. Armeni, Z.-Y. He, J. Gwak, A. R. Zamir, M. Fischer, J. Malik, and S. Savarese, “3d scene graph: A structure for unified semantics, 3d space, and camera,” in IEEE/CVF Int. Conf. on Computer Vision, 2019, pp. 5664–5673.
- J. Wald, H. Dhamo, N. Navab, and F. Tombari, “Learning 3d semantic scene graphs from 3d indoor reconstructions,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 2020, pp. 3961–3970.
- A. Rosinol, A. Gupta, M. Abate, J. Shi, and L. Carlone, “3d dynamic scene graphs: Actionable spatial perception with places, objects, and humans,” Proc. of Robotics: Science and Systems, 2020.
- L. Han, T. Zheng, L. Xu, and L. Fang, “Occuseg: Occupancy-aware 3d instance segmentation,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 2020, pp. 2940–2949.
- J. McCormac, R. Clark, M. Bloesch, A. Davison, and S. Leutenegger, “Fusion++: Volumetric object-level slam,” in Int. Conf. on 3D Vision. IEEE, 2018, pp. 32–41.
- W. Yang, X. Wang, A. Farhadi, A. Gupta, and R. Mottaghi, “Visual semantic navigation using scene priors,” arXiv preprint arXiv:1810.06543, 2018.
- A. Muzahid, W. Wan, F. Sohel, L. Wu, and L. Hou, “Curvenet: Curvature-based multitask learning deep networks for 3d object recognition,” IEEE/CAA Journal of Automatica Sinica, vol. 8, no. 6, pp. 1177–1187, 2020.
- R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A. Shamma, et al., “Visual genome: Connecting language and vision using crowdsourced dense image annotations,” International journal of computer vision, vol. 123, no. 1, pp. 32–73, 2017.
- C. Lu, R. Krishna, M. Bernstein, and L. Fei-Fei, “Visual relationship detection with language priors,” in European Conf. on Computer Vision. Springer, 2016, pp. 852–869.
- N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in European Conf. on Computer Vision. Springer, 2012, pp. 746–760.
- M. Hassan, P. Ghosh, J. Tesch, D. Tzionas, and M. J. Black, “Populating 3d scenes by learning human-scene interaction,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 2021, pp. 14 708–14 718.
- L. L. Wong, L. P. Kaelbling, and T. Lozano-Pérez, “Manipulation-based active search for occluded objects,” in 2013 IEEE International Conference on Robotics and Automation. IEEE, 2013, pp. 2814–2819.
- R. Druon, Y. Yoshiyasu, A. Kanezaki, and A. Watt, “Visual object search by learning spatial context,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 1279–1286, 2020.
- J. Johnson, R. Krishna, M. Stark, L.-J. Li, D. Shamma, M. Bernstein, and L. Fei-Fei, “Image retrieval using scene graphs,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 2015, pp. 3668–3678.
- F. Giuliari, G. Skenderi, M. Cristani, Y. Wang, and A. Del Bue, “Spatial commonsense graph for object localisation in partial scenes,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 2022, pp. 19 518–19 527.
- J. Yang, J. Lu, S. Lee, D. Batra, and D. Parikh, “Graph r-cnn for scene graph generation,” in European Conf. on Computer Vision, 2018, pp. 670–685.
- X. Lin, C. Ding, J. Zeng, and D. Tao, “Gps-net: Graph property sensing network for scene graph generation,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 2020, pp. 3746–3753.
- S.-C. Wu, J. Wald, K. Tateno, N. Navab, and F. Tombari, “Scenegraphfusion: Incremental 3d scene graph prediction from rgb-d sequences,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 2021, pp. 7515–7525.
- N. Hughes, Y. Chang, and L. Carlone, “Hydra: A real-time spatial perception engine for 3d scene graph construction and optimization,” Proc. of Robotics: Science and Systems, 2022.
- T. Tahara, T. Seno, G. Narita, and T. Ishikawa, “Retargetable ar: Context-aware augmented reality in indoor scenes based on 3d scene graph,” in 2020 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct). IEEE, 2020, pp. 249–255.
- C. Agia, K. M. Jatavallabhula, M. Khodeir, O. Miksik, V. Vineet, M. Mukadam, L. Paull, and F. Shkurti, “Taskography: Evaluating robot task planning over large 3d scene graphs,” in Int. Conf. on Robot Learning. PMLR, 2022, pp. 46–58.
- Y. Zhu, J. Tremblay, S. Birchfield, and Y. Zhu, “Hierarchical planning for long-horizon manipulation with geometric and symbolic scene graphs,” in IEEE Int. Conf. on Robotics & Automation. IEEE, 2021, pp. 6541–6548.
- Y. Qiu, A. Pal, and H. I. Christensen, “Learning hierarchical relationships for object-goal navigation,” arXiv preprint arXiv:2003.06749, 2020.
- A. Kurenkov, R. Martín-Martín, J. Ichnowski, K. Goldberg, and S. Savarese, “Semantic and geometric modeling with neural message passing in 3d scene graphs for hierarchical mechanical search,” in IEEE Int. Conf. on Robotics & Automation. IEEE, 2021, pp. 11 227–11 233.
- A. Luo, Z. Zhang, J. Wu, and J. B. Tenenbaum, “End-to-end optimization of scene layout,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 2020, pp. 3754–3763.
- H. Dhamo, F. Manhardt, N. Navab, and F. Tombari, “Graph-to-3d: End-to-end generation and manipulation of 3d scenes using scene graphs,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 2021, pp. 16 352–16 361.
- R. C. Daudt, B. Le Saux, and A. Boulch, “Fully convolutional siamese networks for change detection,” in IEEE International Conference on Image Processing (ICIP). IEEE, 2018, pp. 4063–4067.
- T. Suzuki, M. Minoguchi, R. Suzuki, A. Nakamura, K. Iwata, Y. Satoh, and H. Kataoka, “Semantic change detection,” in 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV). IEEE, 2018, pp. 1785–1790.
- J.-M. Park, J.-H. Jang, S.-M. Yoo, S.-K. Lee, U.-H. Kim, and J.-H. Kim, “Changesim: towards end-to-end online scene change detection in industrial indoor environments,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems. IEEE, 2021, pp. 8578–8585.
- W. Cheng, Y. Zhang, X. Lei, W. Yang, and G. Xia, “Semantic change pattern analysis,” arXiv preprint arXiv:2003.03492, 2020.
- L. Ru, B. Du, and C. Wu, “Multi-temporal scene classification and scene change detection with correlation based fusion,” IEEE Transactions on Image Processing, vol. 30, pp. 1382–1394, 2020.
- S. Kim, K.-n. Joo, and C.-H. Youn, “Graph neural network based scene change detection using scene graph embedding with hybrid classification loss,” in 2021 International Conference on Information and Communication Technology Convergence (ICTC). IEEE, 2021, pp. 190–195.
- X. Li, H. Duan, Y. Tian, and F.-Y. Wang, “Exploring image generation for uav change detection,” IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 6, pp. 1061–1072, 2022.
- Y. Qiu, Y. Satoh, R. Suzuki, K. Iwata, and H. Kataoka, “3d-aware scene change captioning from multiview images,” IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 4743–4750, 2020.
- T. Ku, S. Galanakis, B. Boom, R. C. Veltkamp, D. Bangera, S. Gangisetty, N. Stagakis, G. Arvanitis, and K. Moustakas, “Shrec 2021: 3d point cloud change detection for street scenes,” Computers & Graphics, vol. 99, pp. 192–200, 2021.
- J. Li, P. Tang, Y. Wu, M. Pan, Z. Tang, and G. Hui, “Scene change detection: semantic and depth information,” Multimedia Tools and Applications, vol. 81, no. 14, pp. 19 301–19 319, 2022.
- M. Fehr, F. Furrer, I. Dryanovski, J. Sturm, I. Gilitschenski, R. Siegwart, and C. Cadena, “Tsdf-based change detection for consistent long-term dense reconstruction and dynamic object discovery,” in IEEE Int. Conf. on Robotics & Automation. IEEE, 2017, pp. 5237–5244.
- R. Finman, T. Whelan, M. Kaess, and J. J. Leonard, “Toward lifelong object segmentation from change detection in dense rgb-d maps,” in European Conf. on Mobile Robots, 2013, pp. 178–185.
- E. Langer, T. Patten, and M. Vincze, “Robust and efficient object change detection by combining global semantic information and local geometric verification,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2020, pp. 8453–8460.
- Z. Liao, Q. Huang, Y. Liang, M. Fu, Y. Cai, and Q. Li, “Scene graph with 3d information for change captioning,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 5074–5082.
- T. Krajník, J. Pulido Fentanes, M. Hanheide, and T. Duckett, “Persistent localization and life-long mapping in changing environments using the frequency map enhancement,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2016, pp. 4558–4563.
- L. Wang, W. Chen, and J. Wang, “Long-term localization with time series map prediction for mobile robots in dynamic environments,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems. IEEE, 2020, pp. 1–7.
- S. Casas, C. Gulino, R. Liao, and R. Urtasun, “Spagnn: Spatially-aware graph neural networks for relational behavior forecasting from sensor data,” in IEEE Int. Conf. on Robotics & Automation. IEEE, 2020, pp. 9491–9497.
- Z. Ravichandran, L. Peng, N. Hughes, J. D. Griffith, and L. Carlone, “Hierarchical representations and explicit memory: Learning effective navigation policies on 3d scene graphs using graph neural networks,” in IEEE Int. Conf. on Robotics & Automation. IEEE, 2022, pp. 9272–9279.
- T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
- J. Wald, A. Avetisyan, N. Navab, F. Tombari, and M. Nießner, “Rio: 3d object instance re-localization in changing indoor environments,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 2019, pp. 7658–7667.
- K. Gupta, J. Lazarow, A. Achille, L. S. Davis, V. Mahadevan, and A. Shrivastava, “Layouttransformer: Layout generation and completion with self-attention,” in IEEE/CVF Int. Conf. on Computer Vision, 2021, pp. 1004–1014.
- Y. Shi, Z. Huang, S. Feng, H. Zhong, W. Wang, and Y. Sun, “Masked label prediction: Unified message passing model for semi-supervised classification,” arXiv preprint arXiv:2009.03509, 2020.
- C. Couprie, C. Farabet, L. Najman, and Y. LeCun, “Indoor semantic segmentation using depth information,” arXiv preprint arXiv:1301.3572, 2013.
- M. Jünger, G. Reinelt, and G. Rinaldi, “The traveling salesman problem,” Handbooks in operations research and management science, vol. 7, pp. 225–330, 1995.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.